I'm a passionate data scientist and machine learning engineer with a strong foundation in advanced analytics and spatial data science. Having completed my Master's in Data Science at Rutgers University, I specialize in bridging the gap between cutting-edge research and practical, real-world applications.
My research focuses on Small Area Estimation, spatial microsimulation, and telework pattern analysis, where I leverage advanced machine learning techniques including GANs, transfer learning, and ensemble methods. I'm particularly interested in how data science can inform urban planning and policy decisions.
With experience at Google and multiple published research papers, I bring both industry expertise and academic rigor to every project. I'm always excited to collaborate on innovative solutions that make a meaningful impact.
Rutgers University – New Brunswick
Sardar Patel Institute of Technology
Minor in Computer Engineering
Rutgers Urban and Civic Informatics Laboratory
Under Dr. Piyushimita (Vonu) Thakuriah
Conducting Small Area Estimation (SAE) for modeling telework propensity at the census block group level, integrating multi-modal data sources including ACS, Household Pulse Survey (HPS), Current Population Survey (CPS), and Public Use Microdata Sample (PUMS) using Python, R and SQL
Engineered and optimized a 6x faster Python-based implementation of Iterative Proportional Fitting (IPF) (Raking), significantly improving computational performance and scalability compared to existing R-based methods
Developed and evaluated advanced empirical models, including a binary classifier using XGBoost (F1 score: 0.85) and Probit modeling for marginal effects and policy insight analysis on telework patterns
Advanced spatial microsimulation methods by incorporating Transfer Learning, Ensemble Learning, and Generative Adversarial Networks (GANs), enhancing the accuracy of small-area telework propensity estimation
Designed and built multi-output joint estimation transfer learning frameworks to improve predictive performance on HPS dataset, transferring relevant features from CPS and PUMS, and utilizing Conditional Tabular GANs for synthetic population generation
Produced geospatial visualizations and GIS-based analyses using Leaflet and QGIS, facilitating actionable insights into telework patterns and urban planning
Optimized large-scale data management and processing by integrating SQLite, converting data to Parquet format, and employing Apache Spark for distributed computing, substantially enhancing efficiency and scalability
Google via DKSH Smollan
Developed custom Selenium scrapers to extract smartphone and smart home product pricing data from retail websites across Asia, Europe, and North America, storing data using Cloud Function microservices in Google Cloud SQL databases
Optimized dashboard performance by integrating Google BigQuery, reducing dashboard load times by nearly 50% and improving data accessibility
Designed and deployed interactive dashboards in Looker Studio, enabling data-driven insights for product pricing and trend analysis
Led a team of five junior interns, overseeing script development and microservice integration to ensure timely project delivery
IEEE TENSYMP 2023 (Canberra, Australia)
The development of Intelligent Traffic Signal Control (ITSC) systems is crucial for enhancing traffic flow and mitigating congestion, which is a widespread problem in urban areas globally. Presently, RADAR or inductive loop-based intelligent systems are used in metropolises of developed countries, but the large investment and infrastructure requirements rule out their widespread application. This paper explores a nascent Deep Reinforcement Learning (DRL) approach to the Traffic Signal Control (TSC) problem, as opposed to classical optimization or rule-based approaches of the past. To address the challenges that limit past RL approaches, the study leverages the Deep Deterministic Policy Gradient (DDPG) algorithm to optimize traffic light control policies. The proposed DRL approach shows intelligent behavior and reduces the average delay time and congestion when compared to the traditional RL, past DRL, and fixed-time signal approaches. A comparative analysis of the reward functions is also presented, which reveals insights into the variance of performance.
26th International Symposium on Methodologies for Intelligent Systems (Cosenza, Italy)
Multiple facets of the financial industry, such as algorithmic trading, have greatly benefited from their unison with cutting-edge machine learning research in recent years. However, despite significant research efforts directed towards leveraging supervised learning methods alone for designing superior algorithmic trading strategies, existing studies continue to confront significant hurdles like striking the optimum balance of risk and return, incorporating real-world complexities, and minimizing max drawdown periods. This research work proposes a modified deep reinforcement learning (DRL) approach to automated stock trading with the inclusion of short selling, a new thresholding framework, and employs turbulence as a safety switch. The DRL agents' performance is evaluated on the U.S. stock market's DJIA index constituents. The modified DRL agents are shown to outperform previous DRL approaches and the DJIA index, in terms of absolute returns, risk-adjusted returns, and lower max drawdowns, while giving insights into the effects of short selling inclusion and proposed thresholding.