I recently wrote a blog post for the Betting Expert site about a simple model I created attempting to predict the outcome of football matches using only very simple statistics.
You can read the full blog post here.
I wanted to point out on here something interesting that I found while working on the model; betting odds do a relatively poor job of predicting football match outcomes. In other words, the percentage likelihood of a win, draw and loss for the home team implied from the odds set by bookmakers is surprisingly inaccurate.
My hypothesis for why this happens is that football is very unbalanced, especially in the EPL. It is very hard to predict when an upset is going to happen, mostly because these upsets are (seemingly) random.
Using just 4 factors in my model, including the home team's goal differential for the season up to that game, the away team's goal differential for the season up to that game, the home team's point total from the previous season, and the away team's point total from the previous season, I could create a model that was as accurate as the bookmakers.
The question that remains is how much more accurate can the model become with the introduction of new variables? Beyond that, what variables should be used?
I am not sure I know the answers to those questions, but I am going to keep playing around with the data.