Wyatt Walsh

Eastern Sierras, California · wyatt-remove-[email protected]

Scan contact card

Scannable QR code with contact info

Problem solver scientist educated in industrial engineering and operations research with manifold interests spanning data science, artificial intelligence, simulation, and blockchain.

Feel free to say hello! 👋


Programming Languages
Data-related Expertises
Cloud Technologies
Optimization Tools

Personal Projects

A few of projects to share!

Fully Automated Data Pipeline Using Free, Cloud-Based Solutions

Facilitated other’s sports-analytics data projects by creating the most robust, open-source, NBA-related database. Ensured $0 capital overhead requirements by using free cloud computing and dataset tools. Enabled better testing, deployment, and expansion by containerizing each pipeline segment’s Python scripts.

Read more..

Regularized Linear Regression Deep Dive

Explanations and Python implementations of Ordinary Least Squares regression, Ridge regression, Lasso regression (solved via Coordinate Descent), and Elastic Net regression (also solved via Coordinate Descent) applied to assess wine quality given numerous numerical features. Additional data analysis and visualization in Python is included.

Read more..

Machine Learning for NBA Game Attendance Prediction

The goal of this project was to craft models in order to accurately predict the attendance of a future National Basketball Association (NBA) game. Game data, including attendance, was scraped from stats.nba.com and stadium capacity data collected from numerous online sources. This data was then cleaned, processed, explored through visualizations and statistical tests, and then modeled using many regression techniques including regularized methods, ensemble methods such as Random Forest and Boosting, and neural networks. Feature significance was also determined through techniques such as the Group Lasso and ensembling. The overall mean absolute error (MAE) in the best models was found to be around 750 people. A paper is included summarizing the goals and findings along with notions of future work that could be applied as well. The coding of this project was carried out in a combination of R and Python.

Read more..

Mixed Integer Linear Programming for Fair Division Problems

The goal of this project is to find optimally fair allocations of divisible and non-divisible goods for a group of people under three different definitions of fairness under envy-freeness with certain assumptions. Mixed-integer linear programming (MILP) formulations are created in AMPL and solved using CPLEX resulting in the generation of datasets consisting of the minimal approximate envy value and solver elapsed time for different combinations of number of people and number of goods. Interactive 3D visualizations of this dataset are created in Python and analysis of results is conducted.

Read more..

Wait! There's more..

See all Personal Projects for more examples!


A collection of my published articles

January 2021

Article - Regularized Linear Regression Models: Basics of Linear Regression Modeling and Ordinary Least Squares (OLS)

Context of Linear Regression, Optimization to Obtain the OLS Model Estimator, and an Implementation in Python Using Numpy

January 2021


A few of my current interests

Artificial Intelligence

Nerdy, but like Synthetic Nerdy

Read more..


How can cryptographic hash functions be used in a distributed system to enable implicit trust and system-wide integrative value generation?

Read more..

Machine Learning Operations

A Jupyter Notebook chock full of the world's state-of-the-art models is pretty useless on its own

Read more..


What's your frequency?

Read more..


Searching for clues...

Read more..


What's similar between the minuscule and the massive?

Read more..



Applied data engineering, data science, and analytics to improve the company’s demand forecasting system by 5%-25%, helping to improve the business decisions of leaders across the organization. Utilized distributed computing technologies (Apache Spark) with data from relational databases (SQL) in order to conduct time series analysis and generate recommendations to improve forecasts. After several months of ETL and time-series analysis, the externship concluded with a formal presentation of my team’s findings to an assortment of company leaders.

February 2020 - May 2020

Group Tutor for Data 8: Foundation of Data Science

Facilitated student development as a course staff tutor for the largest in-person data science course of 1600+ students. Throughout the several terms I was involved, I invigorated student interests with 75+ lectures on varying topics in statistics, programming, and analytics as well as ensured course operations by hosting office hours, proctoring exams, grading assignments, and working with other staff members. link to course page

January 2019 - December 2019


University of California, Berkeley

Bachelor of Science
Industrial Engineering & Operations Research


The Hotchkiss School

High School Diploma

2012 - 2014
Nifty tech tag lists from Wouter Beeftink