A Simplified Football Prediction Model
Thursday, August 30, 2012 at 11:04AM I recently wrote a blog post for the Betting Expert site about a simple model I created attempting to predict the outcome of football matches using only very simple statistics.
You can read the full blog post here.
I wanted to point out on here something interesting that I found while working on the model; betting odds do a relatively poor job of predicting football match outcomes. In other words, the percentage likelihood of a win, draw and loss for the home team implied from the odds set by bookmakers is surprisingly inaccurate.
My hypothesis for why this happens is that football is very unbalanced, especially in the EPL. It is very hard to predict when an upset is going to happen, mostly because these upsets are (seemingly) random.
Using just 4 factors in my model, including the home team's goal differential for the season up to that game, the away team's goal differential for the season up to that game, the home team's point total from the previous season, and the away team's point total from the previous season, I could create a model that was as accurate as the bookmakers.
The question that remains is how much more accurate can the model become with the introduction of new variables? Beyond that, what variables should be used?
I am not sure I know the answers to those questions, but I am going to keep playing around with the data.

Reader Comments (3)
Could you possibly re-run the scoring of the betting odds? In my opinion, it would be preferable to score a match prediction with one point when the game outcome matches the betting site's highest-likelihood-outcome, and zero points if it doesn't.
Respectively, could you apply this scoring to all models involved? Could be interesting to see in what way this makes a difference.
I am currently running data analysis (through regressions) on what it takes to win in different leagues. While our project is not yet finished and we have not run a final model yet, one thing we have noticed is that different factors affect winning in different leagues (we are running our regression to find out what factors control total points throughout a season). For example, for the EPL the amount of years a manager has been with a club is statistically significant, however manager experience is insignificant within La Liga and the MLS. So what I would advise would be to possibly work a different model for each league if you're going to add a high number of variables as there does seem to be some difference.