If you had to place a bet, at what minutes do you think the most goals are scored during the course of a soccer game? I was asking myself this exact question, so I decided to try to figure out what the answer was. If scoring is completely random we would expect the distribution of the count of goals scored to be roughly even across every minute of the game. Of course, it is not going to be perfectly distributed because of random errors, but every minute should have roughly the same number of goals, assuming the sample is large enough. I had a hunch that this would not be the case. Specifically, my guess was that there would be more goals scored between the 85th and 90th minutes, whereas there would be fewer in the first 5 minutes of the game. To test this hypothesis, I used data from the Rec.Sport.Soccer Statistics Foundation page from 8 years of the Premiership.
Is there a normal number of goals scored in a season for a striker? To answer this, one may be tempted to just take the mean of the goals scored of every player in a season. If we do this for last season, the mean is 1.83. Of course, this is misleading. There isn't really such thing as a "normal" number of goals scored in a season. The reason for this is that goals scored does not have a standard distribution, the bell curve we are used to. For example, if you looked at the distribution of heights in a population, you would see a nice bell curve. Most people are right around the average height, and as you go towards the extremes either way (really short or really tall) you find fewer and fewer people. Therefore, the mean of heights in the population is instructive because it gives us the "normal" or "typical" height. The problem is, goals scored in a season does not follow a standard distribution. Instead, most players score no goals at all. The next most common number of goals scored last season? Just one goal, of course. This distribution continues, and it follows a power law distribution.
Now that some of the advanced data set has been released by Manchester City's performance analysis department it's a good time to start delving in to the data to see what kind of analysis can be done. Although the advanced data set is only for one game-- Bolton vs. Manchester City from last season-- there is still A LOT of data to look at.
The advanced data contains (x,y) location information of every statistic that is kept. This is valuable information, as it obviously tells exactly where each event happened in the game. I was interested in how this information can be used, specifically to look at momentum and passing trends.
Some work has already been done in the soccer analytics community on trying to quantify and analyze momentum. The Analyse Football looked at momentum shifts from this same game, although in a different way. The Soccer by the Numbers blog looks at momentum in football in a much more general way.
I wanted to point out an excellent blog post from the blog Professor Pepper's Assistant.
If you're an R user and are having trouble dealing with the Advanced MCFC Analytics XML data file, the link above provides the code to pull the data in to a data frame in R. After this it is easy to perform whatever analysis you want on it.
I'll admit the code above is beyond my limited R skill level, but I know that it works. I'm excited to start doing some analysis, although the advanced data set is only for one game from last season at this point.