## Win Probability Added in Soccer

Everyone hears it all the time: A 2-0 lead is the most dangerous lead in soccer. But is it really? Thinking about the led me to wonder how exactly dangerous leads were in soccer. In fact, I wanted to find out what win, loss and draw percentages a team had in all situations. The best way to find this out is to analyze a lot of games and calculate the win, loss, and draw percentages in every possible game situation. To do this, I took in to account the venue of the game (home versus away), the goal differential between the teams (team is up by 2, team is up by 1, game is tied, team is down 1, team is down by 2 etc) and the minute of the game. I took goal differentials of -5 to 5 and minutes 1-90. I thought these were probably really the most important factors. You could maybe take in to account cards too, but this is hard and makes it pretty complicated. Overall, there are 2*11*90 = 1980 combinations of game situations.

The idea relates to WPA in baseball. Basically, WPA is a measurement of how much a play adds to the chance the team wins a game. For example, how much does a 2 run home run help the team's chances in the 6th inning? In soccer, a question would be how much does a goal at home to give you a 2 goal lead in the 67th minute change your winning percentage in the game? Pretty simple concept.

To get the percentages for all of these situations I imported game data from the past 10 years of the EPL in to Excel. My Excel skills are not the best but with some help I was able to eventually get these to convert in to percentages for each game situation mentioned above. The basic idea is this: how often do teams with a 1 goal lead in the 40th minute at home win? How often do they draw? How often do they lose? This was done for every minute and every goal differential both home and away. The results truly tell us how dangerous a variety of leads are.

Here's an example: The team is away, the game is tied, and it's the 67th minute. Any guesses on the win, draw and loss percentages? Well turns out the team has about a 19% chance of winning, a 51% chance of drawing, and a 30% chance of losing.

We can also test the "2 goal leads are the most dangerous leads theory". Let's say the team is home and it's the 35th minute. Here are the percentages for 1 and 2 goal leads:

1 goal lead: win: 78%, draw: 16%, loss: 6%

2 goal lead: win: 96%, draw: 2%, loss: 2%

The same holds true for all minutes and both home and away teams. A 2 goal lead is in fact not the most dangerous lead in soccer.

I'm also in the process of making a Java Applet to post here that lets you input the goal differential, venue, and minute, and spits out the win, loss and draw percentages. Again, my Java programming talents are not the best, so no promises on anything getting finished or uploaded soon. I uploaded the actual excel files to a google sites page though if you're curious to look at other percentages. If you want to download the files click here and type in the search bar ".htm" without the quotes to find the files.

Next, I'm planning on relating this more to how WPA is used in baseball by using it to analyze specific players by calculating how much percentage they add to their team winning by scoring goals. Not sure how useful this statistic will actually be, but it's worth a shot.

## Reader Comments (1)

Hi Ford,

Came across the blog (via twitter) and as a pretty big math/stats guy who recently got into soccer over the last few years, I am jealous of what you have put together. I figured I would start at the beginning and read through some of your posts in order to learn more about the statistics behind the game and hopefully discuss some of the topics with you a bit further.

In my opinion, I feel as though the use of "Win Probability Added" as referenced in the title is a bit misleading. As stated in the last paragraph of your post, WPA is primarily used in baseball as a player statistic (not a team statistic) and takes the Win Probability of the team after an at-bat and subtracts the Win Probability the team had prior to the at-bat and the difference is attributed to the player. While this could be used for scoring plays, such as a 2-run home-run in the 6th inning, it also takes into account everything else the player does, such as bases empty walk in the 1st inning.

In order for WPA to be useful for individual players in soccer, I think you would have to take the Win Probability of the team prior to a player possessing the ball and after the player possesses the ball and find the difference. This would have to factor in position on the field, loss of possession, etc. and seems a bit complicated for right now. Also, many soccer goals are often team efforts and setup by a great pass or a rebound falling at a players feet, whereas in baseball getting on base is more of an individual effort. Giving the entire Win Probability that was added from the goal to the player who scored would skew the statistic.

Although I don't think it should reference "Win Probability Added" (maybe just "Win Probability"), I think there is some incredible data in your post about the win probabilities of a team given a certain goal differential and time remaining. You use these statistics to disprove the theory that the 2 goal leads are the most dangerous, but I think anyone who believes in numbers and statistics could have seen that coming as a clear old fashion myth.

What would be interesting to me would be whether the results showed anything that was counter-intuative. Are there any times, for example, where the same goal differential leads to a slightly smaller win probability later in the game? It would make intuative sense that for a 2 goal lead, for example, the win probability would slowly increase as time goes on from minute 1 to minute 90, but perhaps there is some anomaly where having a two goal lead gets worse from minute 14 to 15. Also, it would be interesting to see how the win percentage increases slowly over the 90 minutes for a constant goal differential. Is it linear? Exponential? Are there certain minutes that give the biggest jumps?

Another question the data you used could help answer would be whether all goal amounts are equal. Clumping all 2 goal leads together was very useful, but I would also be interested to see what it looks like if you separate them out. Is a 4-2 lead the same as a 2-0 lead? Perhaps in a 2-0 game one team with a poor offense is up against a team with a great defense (reflected by their failure to score) and therefore this lead is safer than a 4-2 lead, which may be a game with a lack of defense (and great offenses) where goals are a dime a dozen.

Scanning some of the titles, it looks like you may address some of the topics I bought up in additional posts, but I figured I would air out some of my ideas here. I would really love to hear back from you to discuss some of your ideas a bit further either on here or via email.

Best,

Jason