Personal Projects

A few of projects to share!

Fully Automated Data Pipeline Using Free, Cloud-Based Solutions

Facilitated other’s sports-analytics data projects by creating the most robust, open-source, NBA-related database. Ensured $0 capital overhead requirements by using free cloud computing and dataset tools. Enabled better testing, deployment, and expansion by containerizing each pipeline segment’s Python scripts.

Read more..

Regularized Linear Regression Deep Dive

Explanations and Python implementations of Ordinary Least Squares regression, Ridge regression, Lasso regression (solved via Coordinate Descent), and Elastic Net regression (also solved via Coordinate Descent) applied to assess wine quality given numerous numerical features. Additional data analysis and visualization in Python is included.

Read more..

Conceptualizing Higher Education Institutions: An Agent-Based Modelling Approach

A computational simulation using Python of an arbitrary abstract higher education system in order to gain a better understanding of contributing factors of institutional growth and demise.

Read more..

Machine Learning for NBA Game Attendance Prediction

The goal of this project was to craft models in order to accurately predict the attendance of a future National Basketball Association (NBA) game. Game data, including attendance, was scraped from stats.nba.com and stadium capacity data collected from numerous online sources. This data was then cleaned, processed, explored through visualizations and statistical tests, and then modeled using many regression techniques including regularized methods, ensemble methods such as Random Forest and Boosting, and neural networks. Feature significance was also determined through techniques such as the Group Lasso and ensembling. The overall mean absolute error (MAE) in the best models was found to be around 750 people. A paper is included summarizing the goals and findings along with notions of future work that could be applied as well. The coding of this project was carried out in a combination of R and Python.

Read more..

Mixed Integer Linear Programming for Fair Division Problems

The goal of this project is to find optimally fair allocations of divisible and non-divisible goods for a group of people under three different definitions of fairness under envy-freeness with certain assumptions. Mixed-integer linear programming (MILP) formulations are created in AMPL and solved using CPLEX resulting in the generation of datasets consisting of the minimal approximate envy value and solver elapsed time for different combinations of number of people and number of goods. Interactive 3D visualizations of this dataset are created in Python and analysis of results is conducted.

Read more..

Nifty tech tag lists from Wouter Beeftink