Machine Learning for NBA Game Attendance Prediction

The goal of this project was to craft models in order to accurately predict the attendance of a future National Basketball Association (NBA) game. Game data, including attendance, was scraped from and stadium capacity data collected from numerous online sources. This data was then cleaned, processed, explored through visualizations and statistical tests, and then modeled using many regression techniques including regularized methods, ensemble methods such as Random Forest and Boosting, and neural networks. Feature significance was also determined through techniques such as the Group Lasso and ensembling. The overall mean absolute error (MAE) in the best models was found to be around 750 people. A paper is included summarizing the goals and findings along with notions of future work that could be applied as well. The coding of this project was carried out in a combination of R and Python.

Project link:

Nifty tech tag lists fromĀ Wouter Beeftink