Scan contact card
Feel free to say hello! 👋
A few of projects to share!
Facilitated other’s sports-analytics data projects by creating the most robust, open-source, NBA-related database. Ensured $0 capital overhead requirements by using free cloud computing and dataset tools. Enabled better testing, deployment, and expansion by containerizing each pipeline segment’s Python scripts.
Explanations and Python implementations of Ordinary Least Squares regression, Ridge regression, Lasso regression (solved via Coordinate Descent), and Elastic Net regression (also solved via Coordinate Descent) applied to assess wine quality given numerous numerical features. Additional data analysis and visualization in Python is included.
The goal of this project was to craft models in order to accurately predict the attendance of a future National Basketball Association (NBA) game. Game data, including attendance, was scraped from stats.nba.com and stadium capacity data collected from numerous online sources. This data was then cleaned, processed, explored through visualizations and statistical tests, and then modeled using many regression techniques including regularized methods, ensemble methods such as Random Forest and Boosting, and neural networks. Feature significance was also determined through techniques such as the Group Lasso and ensembling. The overall mean absolute error (MAE) in the best models was found to be around 750 people. A paper is included summarizing the goals and findings along with notions of future work that could be applied as well. The coding of this project was carried out in a combination of R and Python.
The goal of this project is to find optimally fair allocations of divisible and non-divisible goods for a group of people under three different definitions of fairness under envy-freeness with certain assumptions. Mixed-integer linear programming (MILP) formulations are created in AMPL and solved using CPLEX resulting in the generation of datasets consisting of the minimal approximate envy value and solver elapsed time for different combinations of number of people and number of goods. Interactive 3D visualizations of this dataset are created in Python and analysis of results is conducted.
See all Personal Projects for more examples!
A collection of my published articles
A few of my current interests
How can cryptographic hash functions be used in a distributed system to enable implicit trust and system-wide integrative value generation?
A Jupyter Notebook chock full of the world's state-of-the-art models is pretty useless on its own
Applied data engineering, data science, and analytics to improve the company’s demand forecasting system by 5%-25%, helping to improve the business decisions of leaders across the organization. Utilized distributed computing technologies (Apache Spark) with data from relational databases (SQL) in order to conduct time series analysis and generate recommendations to improve forecasts. After several months of ETL and time-series analysis, the externship concluded with a formal presentation of my team’s findings to an assortment of company leaders.
Facilitated student development as a course staff tutor for the largest in-person data science course of 1600+ students. Throughout the several terms I was involved, I invigorated student interests with 75+ lectures on varying topics in statistics, programming, and analytics as well as ensured course operations by hosting office hours, proctoring exams, grading assignments, and working with other staff members. link to course page