The Sloan Sports Analytics Conference was this past weekend. I attended the 2012 conference and was looking forward to seeing how much the soccer analytics community had progressed. Unfortunately, the soccer panel was very similar to the one two years ago. While I'm not quite as pessimistic as Howard Hamilton, I understand where his viewpoint is coming from. I think the reason for this lack of progress in the soccer analytics community is threefold:
I've updated the site's Outcome Probability Calculator. I updated and added more games of data, changed the methodology somewhat, and created a new online app. The first iteration of the app was featured on the Wall Street Journal's website in a blog post called Arsenal Beats Reading and Math
If you had to place a bet, at what minutes do you think the most goals are scored during the course of a soccer game? I was asking myself this exact question, so I decided to try to figure out what the answer was. If scoring is completely random we would expect the distribution of the count of goals scored to be roughly even across every minute of the game. Of course, it is not going to be perfectly distributed because of random errors, but every minute should have roughly the same number of goals, assuming the sample is large enough. I had a hunch that this would not be the case. Specifically, my guess was that there would be more goals scored between the 85th and 90th minutes, whereas there would be fewer in the first 5 minutes of the game. To test this hypothesis, I used data from the Rec.Sport.Soccer Statistics Foundation page from 8 years of the Premiership.
Is there a normal number of goals scored in a season for a striker? To answer this, one may be tempted to just take the mean of the goals scored of every player in a season. If we do this for last season, the mean is 1.83. Of course, this is misleading. There isn't really such thing as a "normal" number of goals scored in a season. The reason for this is that goals scored does not have a standard distribution, the bell curve we are used to. For example, if you looked at the distribution of heights in a population, you would see a nice bell curve. Most people are right around the average height, and as you go towards the extremes either way (really short or really tall) you find fewer and fewer people. Therefore, the mean of heights in the population is instructive because it gives us the "normal" or "typical" height. The problem is, goals scored in a season does not follow a standard distribution. Instead, most players score no goals at all. The next most common number of goals scored last season? Just one goal, of course. This distribution continues, and it follows a power law distribution.