I just finished The Numbers Game: Why Everything You Know About Soccer is Wrong, and really enjoyed it. I've been lucky enough to meet Chris at the MIT Sports Analytics Conference, and have also met a number of the other people featured in the book. I even played pickup soccer last summer in New York City with Ramzi Ben Said, the Cornell undergrad tasked with collecting some of the data for the book. All in all, the names that come up are very similar to the names on my Twitter timeline. If you're reading this blog and have read the book, you probably recognize a lot of the names also.
Here are some of my thoughts:
I really like the emphasis put on the role that luck plays in soccer. People are often inclined to try to use data to explain everything that happens in soccer. Of course, this is impossible. There are always going to be events that simply cannot be predicted, and I think this is crucial to keep in mind going forward. Data can give you an edge, but it is never going to be the only factor that leads clubs to championships or promotion. To use an example from the book, no model is every going to predict a beach ball coming in to play and deflecting a shot in to the net. This is a glaring example of randomness sneaking in to the game to determine a result, but there are countless other incidences in every match that affect results.
A fact I found interesting: 99 percent of the time players didn't touch the ball, and 98.5 percent of the time they ran without it (page 143). Analysts have focused almost all of their time on the events that occur when the ball is at a player's feet. That's focusing on only 1% of a player's input in the game. While these are the most important and easy to record parts of the game, it is interesting that we ignore almost 99% of a player's activities when looking at their performance. Ignored is the work a player does to get in the right position defensively, or the right angle to receive a pass, or the right space to be able to take a shot. These are extremely difficult to analyze and keep track of, but they are likely just as important as the events that occur when a player actually does have the ball.
The part of the book that will have the greatest influence on how clubs behave, at least in my opinion, is the weak link versus strong link analysis (page 218). Basically, the analysis in the book says that improving a club's weakest player is more beneficial than improving a club's strongest player. This is a huge finding, and completely goes against what is "known" in soccer. Instead of signing the big name striker for way too much money, simply upgrading your weak left back to a stronger one can improve your club for a lower cost. Of course, this is not going to be the case every time, but their analysis provides strong evidence that this is the case a lot of the time. If I were employed by a club to analyze signings, this is the first idea I would bring up to improve the club.
Overall, I think Chris and David do a good job of making the book easy to read and entertaining. I assume it would be all too easy to make the analysis complicated (they are Ivy League professors, of course). However, they do a good job of weaving a narrative while also making a lot of very good insights. That being said, the future of soccer analytics will likely be a bit more complicated. As they point out themselves in Forecast 4 on page 304, mathematical tools like algebraic geometry and network analysis are going to provide the base of most insights going forward. Instead of ball events, it is going to be the more complicated analysis of off the ball events like spacing and positioning. These off the ball events make up the 99% of the game that is generally not recorded or analyzed, as mentioned before. These are likely going to be the events that penetrate deepest in to the flowing, dynamic, and non-stop nature of soccer.
If you haven't yet read it and you're reading this blog, you should probably pick up a copy and read it.