Ganesharpan Annajirao Nookala

Arpan Nookala

Data Scientist & ML Engineer|

MS Data Science @ Rutgers University • Specializing in Advanced Analytics, Machine Learning, and Spatial Data Science • Building intelligent solutions that bridge academic research with real-world impact

About Me

I'm a passionate data scientist and machine learning engineer with a strong foundation in advanced analytics and spatial data science. Having completed my Master's in Data Science at Rutgers University, I specialize in bridging the gap between cutting-edge research and practical, real-world applications.

My research focuses on Small Area Estimation, spatial microsimulation, and telework pattern analysis, where I leverage advanced machine learning techniques including GANs, transfer learning, and ensemble methods. I'm particularly interested in how data science can inform urban planning and policy decisions.

With experience at Google and multiple published research papers, I bring both industry expertise and academic rigor to every project. I'm always excited to collaborate on innovative solutions that make a meaningful impact.

Education

Master of Science in Data Science

Rutgers University – New Brunswick

New Jersey
Sep 2023 – May 2025
GPA: 3.8/4.0
Bachelor of Technology in Electronics Engineering

Sardar Patel Institute of Technology

Mumbai, India
Aug 2019 – Jun 2023
GPA: 8.81/10

Minor in Computer Engineering

Professional Experience

Data Science & Machine Learning Analyst

Rutgers Urban and Civic Informatics Laboratory

Under Dr. Piyushimita (Vonu) Thakuriah

New Brunswick, NJ
Feb 2024 - Present
Leading advanced research in Small Area Estimation and telework propensity modeling using cutting-edge ML techniques.

Key Achievements:

Conducting Small Area Estimation (SAE) for modeling telework propensity at the census block group level, integrating multi-modal data sources including ACS, Household Pulse Survey (HPS), Current Population Survey (CPS), and Public Use Microdata Sample (PUMS) using Python, R and SQL

Engineered and optimized a 6x faster Python-based implementation of Iterative Proportional Fitting (IPF) (Raking), significantly improving computational performance and scalability compared to existing R-based methods

Developed and evaluated advanced empirical models, including a binary classifier using XGBoost (F1 score: 0.85) and Probit modeling for marginal effects and policy insight analysis on telework patterns

Advanced spatial microsimulation methods by incorporating Transfer Learning, Ensemble Learning, and Generative Adversarial Networks (GANs), enhancing the accuracy of small-area telework propensity estimation

Designed and built multi-output joint estimation transfer learning frameworks to improve predictive performance on HPS dataset, transferring relevant features from CPS and PUMS, and utilizing Conditional Tabular GANs for synthetic population generation

Produced geospatial visualizations and GIS-based analyses using Leaflet and QGIS, facilitating actionable insights into telework patterns and urban planning

Optimized large-scale data management and processing by integrating SQLite, converting data to Parquet format, and employing Apache Spark for distributed computing, substantially enhancing efficiency and scalability

Data Scientist

Google via DKSH Smollan

Remote
Jul 2021 – Jul 2022
Led data science initiatives for Google's retail operations across multiple continents, focusing on pricing intelligence and dashboard optimization.

Key Achievements:

Developed custom Selenium scrapers to extract smartphone and smart home product pricing data from retail websites across Asia, Europe, and North America, storing data using Cloud Function microservices in Google Cloud SQL databases

Optimized dashboard performance by integrating Google BigQuery, reducing dashboard load times by nearly 50% and improving data accessibility

Designed and deployed interactive dashboards in Looker Studio, enabling data-driven insights for product pricing and trend analysis

Led a team of five junior interns, overseeing script development and microservice integration to ensure timely project delivery

Featured Projects

July 2025 – Present
Intelligent Multi-Agent Research Discovery Platform
Building a production-ready multi-agent system using LangGraph and CrewAI for intelligent research paper discovery. Features 5 specialized AI agents, semantic search across arXiv/Semantic Scholar/PubMed, personalized recommendations, and real-time agent orchestration with MCP integration.
Python
FastAPI
LangGraph
CrewAI
Next.js
PostgreSQL
Qdrant
MCP
Docker
Oct 2024 – Dec 2024
Research Paper Recommendation System
Developed a Research Paper Recommendation System using TF-IDF and fine-tuned SBERT (all-MiniLM-L6-v2), achieving 78.9% accuracy and an F1 score of 79.64%, with embeddings efficiently stored and retrieved from LanceDB.
Python
SBERT
TF-IDF
LanceDB
NLP
Nov 2024 – Dec 2024
Commodity Trading using Alternative Data
Explored coffee trading strategies integrating weather anomalies and technical signals for trend-following and mean-reversion methods.
Python
Financial Analysis
Weather APIs
MACD
RSI
Oct 2023 – Dec 2023
Credit Card Fraud Detection
Logistic regression with SMOTE for class imbalance, achieving a 95% ROC AUC on a real card transaction dataset.
Python
Logistic Regression
SMOTE
Feature Engineering
Jul 2022 – Aug 2023
Deep RL Traffic Control
Built a DDPG-based model to optimize traffic signal timing, reducing average wait times by up to 23% on a grid of intersections.
Python
Deep RL
DDPG
DQN
IEEE Publication
Jul 2021 – Oct 2022
Automated Stock Trading with Short Selling
Integrated short-selling thresholds into DRL-based stock trading, outperforming previous methods by 11.4% p.a.
Python
OpenAI Gym
DRL
Financial Trading
Springer
Jan 2021 – Jun 2021
Fire Detection & Localization
Deployed Google's Inception V3 on images of fire, smoke, and neutral scenes, integrated with IoT for real-time alerts.
Python
Inception V3
OpenCV
IoT
Telegram

Publications

Deep Reinforcement Learning based Intelligent Traffic Control
First Author
Jul 2022 – Aug 2023

IEEE TENSYMP 2023 (Canberra, Australia)

Presented novel DRL approaches for optimizing traffic signal timing systems.

Abstract:

The development of Intelligent Traffic Signal Control (ITSC) systems is crucial for enhancing traffic flow and mitigating congestion, which is a widespread problem in urban areas globally. Presently, RADAR or inductive loop-based intelligent systems are used in metropolises of developed countries, but the large investment and infrastructure requirements rule out their widespread application. This paper explores a nascent Deep Reinforcement Learning (DRL) approach to the Traffic Signal Control (TSC) problem, as opposed to classical optimization or rule-based approaches of the past. To address the challenges that limit past RL approaches, the study leverages the Deep Deterministic Policy Gradient (DDPG) algorithm to optimize traffic light control policies. The proposed DRL approach shows intelligent behavior and reduces the average delay time and congestion when compared to the traditional RL, past DRL, and fixed-time signal approaches. A comparative analysis of the reward functions is also presented, which reveals insights into the variance of performance.

Deep Reinforcement Learning for Automated Stock Trading: Inclusion of Short Selling
Second Author
Jul 2021 – Oct 2022

26th International Symposium on Methodologies for Intelligent Systems (Cosenza, Italy)

Published in Springer LNAI vol 13515, focusing on advanced trading strategies.

Abstract:

Multiple facets of the financial industry, such as algorithmic trading, have greatly benefited from their unison with cutting-edge machine learning research in recent years. However, despite significant research efforts directed towards leveraging supervised learning methods alone for designing superior algorithmic trading strategies, existing studies continue to confront significant hurdles like striking the optimum balance of risk and return, incorporating real-world complexities, and minimizing max drawdown periods. This research work proposes a modified deep reinforcement learning (DRL) approach to automated stock trading with the inclusion of short selling, a new thresholding framework, and employs turbulence as a safety switch. The DRL agents' performance is evaluated on the U.S. stock market's DJIA index constituents. The modified DRL agents are shown to outperform previous DRL approaches and the DJIA index, in terms of absolute returns, risk-adjusted returns, and lower max drawdowns, while giving insights into the effects of short selling inclusion and proposed thresholding.

Technical Skills

Machine Learning & AI
Expertise in traditional ML, deep learning, NLP, and reinforcement learning algorithms.
Machine Learning
Linear/Non-linear Regression
Sampling(Bootstrap, Cross-Validation, Regularization)
Dimensionality Reduction(PCA, t-SNE, UMAP)
Decision Trees(XGBoost, Random Forest, Gradient Boosting etc.)
Bayesian Networks
Clustering(KMeans, Hierarchical, DBSCAN etc.)
Reinforcement Learning(MDP, PPO/MAPPO)
Deep Learning
CNNs
Transformers
GANs
Multimodal Systems
LLM Finetuning (PEFT/SFT/QLoRA etc.)
Advanced RAG
Diffusion Models
Graph Models
GNNs
Programming Languages
Proficiency in multiple programming languages for diverse project requirements.
Python
C++
JavaScript/TypeScript
R
Java
Cloud & DevOps
Deploying and managing scalable infrastructure and CI/CD pipelines.
AWS
GCP
Docker
Kubernetes
GitHub Actions
Jenkins
MLflow
Data Systems
Managing and optimizing various database systems for efficient data storage and retrieval.
SQL
NoSQL
Vector DBs
MySQL
PostgreSQL
MongoDB
Redis
LanceDB
Pinecone
MLOps
Implementing best practices for machine learning operations and model lifecycle management.
MLflow
DVC
Model Monitoring
Pipeline Automation
Big Data Processing
Handling and processing large-scale datasets efficiently.
PySpark
Kafka
Airflow
Dask
Data Visualization & GIS
Creating interactive visualizations and geospatial analyses for complex datasets.
Plotly
Superset
Streamlit/Gradio
Tableau
PowerBI
Leaflet
Kepler.gl
GeoPandas
QGIS
Full-Stack Development
Building robust and scalable web applications with modern frameworks.
Python
JavaScript
React
Django
Node.js
Next.js
Version Control
Efficient code management and collaboration using version control systems.
Git
GitHub
GitLab