BAYESIAN MADNESS

A statisitcal model for the 2019 NCAA March Madness Tournament that correctly predicted the winner based on season performance.

PROJECT FEATURES

Data Science
Statistical Modeling
Brier Scores & MSE

BAYESIAN MADNESS

A statisitcal model for the 2019 NCAA March Madness Tournament that correctly predicted the winner based on season performance.

PROJECT FEATURES

Data Science
Statistical Modeling
Brier Scores & MSE

BAYES THEOREM

The basis for the model was Bayes Theorem, a statistical tool useful for updating probabalisitic forecasts to reflect new information. The historical win percentage for the matchup between any two seeds was used as a bayesian prior, and then forecasted win percentages based on Simple Rating System (SRS), Free Throw %, 3 Point %, and Turnover % were used to update the prior.

MATCHUP BASED FORECASTS

Forecasts based on individual stats were created by examining the difference in the stat between teams and then determining the win percentage of teams with such a difference in the history of all tournament games. For instance, Vermont shot Free Throws a point and a half better than Florida State, and from historical data teams with that difference tend to win 61% of their games, so that percentage was used to update the forecast.

THE RESULTS DON'T LIE...

The bracket produced by the model correctly predicted the University of Virginia's tournament win and ranked in the 98th percentile of all brackets submitted to ESPN.

...BUT THEY CAN BE MISLEADING.

The "Brier Skill Score" for a model attempts to analytically determine the value of a forecasting model versus an unskilled (50%-50%) prediction using the mean squared error of each prediction. Given that the model was somewhat overconfident (upsets that it rated as occurring 10% or less acutally occurred ~25% of the time) it had a Brier Score of only .12, better than an unskilled prediction but not as infallible as the bracket results might suggest.

Want to Learn More?
Contact Me