Refining The Win Probability Statistic

August 03, 2011Ford Bohrmann

Last year I was planning on going to go to the Sloan Sports Conference but ended up not being able to make it. I was thinking about it again this year, and I decided it wouldn't be a bad idea to submit something for this year’s conference. At first I wasn’t going to, but why the hell not? Might as well go for it, I guess.

My win probability added statistic has generated some interest for people, and I think it gives some pretty interesting insight, so I’ve been working on expanding it. If you have no idea what win probability added is, check out my first post on win probability and another on win probability added. Anyways, thus begins my quest to refine and expand the win probability added statistic for submission to the sports conference. To make it a lot better, comments, criticisms, and suggestions are very much appreciated and would help a lot.

The first fix I made was change the name based on a simple fix. The problem with “win probability added” is that it doesn’t necessarily calculate the win probability added. That’s a little bit problematic. For example, if two teams are tied in the 90th minute, the win probability under my old calculations was .333 for both teams. This doesn’t really make sense, because each team has close to a 0% chance of winning the game, not 1/3. This comes from modeling the statistic after the similar calculation in professional baseball. My fix for the problem is extremely simple: multiply all the values by 3. This changes the statistic from win probability added, to the expected points added. It basically makes much more sense now. If a player scores a go ahead goal in the 90th minute, the Expected Points Added (easier to write EPA from now on) is going to be almost 2. If a player scores a tying goal in the 90th minute the EPA would be almost 1. Much simpler and easier this way (originally got the idea from @11tegen11’s similar analysis).

After this, I noticed the graphs were not nice easy curves. Even though I took a big sample size of games (about 10 years worth) there isn’t enough data to give a nice curve. To fix this, I just created lines of best fit for each game situation. The home and away graphs for each minute and goal differential are below. Before there were a few situations that didn’t give a realistic expected point total because there were so few game situations (like a 2 goal lead in the 5th minute). Making the nice smooth curves fixes this problem. It also allows me to use equations to calculate EPA instead of the annoying process of referencing a massive excel chart.

I think there’s a lot of possible paths to take from here. I’m going to recalculate the top goal scorer’s EPA using the equations. It won’t change much, but it’ll be nice to have some continuity because I’ll be calculating EPA week by week for every goal next EPL season.

I’m also working on creating a database of the top goal scorers in the last 10 years in the EPL with their goal totals and their EPA over the years. Looking at goals and EPA over time will hopefully give some insights in to clutch (or lack thereof) goal scoring. If some players consistently have very high EPA’s and some players consistently have low EPA’s, it could be an indicator of clutch goal scoring in football.

Like I said before, I’d love comments and suggestions on ideas for where to go next on the blog, via Twitter, or even email.

tags win probability, viz, win probability added, SSAC

WPA and AGW Weekly Updates this Season

July 12, 2011Ford Bohrmann

I just added the image on the right of the page ranking the players ranked by their WPA totals. The chart also includes the player's AGW and their goal totals for the season. I'll update this every week during the EPL season. An explanation of WPA and AGW are below.

WPA: Win Probability Added defines exactly what it sounds like it should: How much a player has added to their team's success through their goals. The way I calculate this is to sum how much each player's goals add to the team's probability of winning. Goals are a flawed statistic because every goal is obviously not worth the same amount. The 5th goal in the 90th minute in a 5-0 win is not important. The 1st goal in the 90th minute in a 1-0 win obviously is very important. To quantify these values I accumulated the total record (wins, losses, and ties) of every game in the past 10 years in the EPL. This way, I could calculate the exact winning percentage at every different game situation for both teams. For example, I know that scoring the 2nd goal to make a game 2-0 at home in the 67th minute increases a team's chance of winning by 10.845983%. WPA takes in to account the importance of each goal, and shows how much, overall, a player has added to their team's chance of winning a game through their goals.

AGW: Average Goal Weight is simply how much, on average, the player's goal is worth. Mathematically, it is the player's total WPA divided by the number of goals they have scored. For example, one player may only score 5 goals on the season, whereas another may score 15. However, the first player could have a higher AGW if they tended to score pivotal goals while the second player scored useless goals.

WPA and AGW are not perfect statistics, but they do provide a little more insight in to a player's goal scoring ability.

tags expected points added, average goal weight, win probability, win probability added

Win Probability Graphs and Regressions

June 24, 2011Ford Bohrmann

Earlier in this blog I wrote a post on Win Probability in every possible game situation. I posted the excel files but they aren't as informative as a graph. I made up graphs for home and away and +2, +1, 0, -1, and -2 goal differentials for every minute. I didn't make up graphs for GD's bigger than that because there is basically no point. The fact that a team has a .999% win probability when they are up 4-0 isn't that exciting.

Each graph has the line of best fit and a scatter plot of the data. The equations for those lines are also on the graph along with the r^2 value for correlation. The graphs are below to look at. Some interesting things I noticed:

-Most graphs show a very strong relationship between minute and win probability. The only ones that don't really are when teams are away and are tied, when teams are home and up by 2, and when teams are away and down by 2. Not really sure why these three stick out.

-Some of the graphs have linear relationships, while others are quadratic. Again, not really sure why this is. Why is the win probability when you are at home and tied follow a quadratic curve while the win probability of a team at home and down by 1 is linear? Maybe people have ideas as to why this happens.

-For some of the scenarios (the +2 and -2 GD's for home and away) I didn't start the graph at minute 1 because the data points were a little all over the place. This happens because there are so few data points so the win probabilities are screwed. Example: There aren't many times when a team has a 2-0 lead in the 5th minute.

-I added the graphs of all the goal differentials together for comparison, one for home and one for away. They're interesting to look at.

-Finally, because of this we now have some basic equations to model a team's chance of winning a game. Feel free to use them and check them out.

tags win probability, regression, win probability added

WPA and AGW: Van Persie is overrated

June 23, 2011Ford Bohrmann

Well, maybe the title is a little exaggerated. What I really mean is the value of Van Persie's goals last season are overweighted. On the other hand, Darren Bent's goals were undervalued. The explanation comes from WPA, or "win probability added".

If you read the last post, I explained win probability. If not, check it out here. Because we have a probability for every game situation, I was able to weight goals by the added win probability a team has from that goal. In soccer, is a little more complicated because teams can tie. To solve this, I use win percentages instead of win probability. To get a team's win percentage you weight a win as 1 point, a draw as 1/3 of a point, and a loss as 0. The sum of these divided by the number of games a team has played gives us the win percentage. I guess in this case it should be win percentage added instead.

The added part comes in by calculating how much a goal adds to a teams win percentage. Here are a couple of examples:

-A goal in the 95th minute to put the home team up by a goal would have a WPA of .666666. A tie game in the 95th minute gives the home team a win percentage of .33333 (almost every time they will draw the game). However, in this case the home team scored. Now the score is 1-0 in the 95th minute. Now the home team's win percentage is almost 1 (almost every time they will win the game). To get the WPA of the goal we subtract the win percentage before the goal (.3333) from the win percentage after the goal (1). This gives us a WPA of .666666

Basically what WPA does is values goals that are more important to the team. In the example above, that goal is obviously very important to the team. However, a goal in the 90th minute to put a team up by 6 would be worthless to the team. That goal would have a WPA of 0.

I calculated the WPA of the top scorers in the EPL last season (players with more than 10 goals). Interestingly enough, the list shook up a bit. The table is below.

Notably, Darren Bent moves up to first on the list, and Van Persie moves down to 8th. Beyond this, I wanted to know which players tend to score more important goals and which players score non-important goals. Obviously, Van Persie has a higher WPA than most of these players because he scored a lot more goals than them.

The way I did this was to calculate the average WPA of a goal by a player. I called this the Average Goal Weight, or AGW. The list of the AGW versus goals is below.

Not surprisingly, Van Persie moves to the bottom of the list, and Bent stays at the top. So what does all this mean? I don't think its a good idea to jump to the conclusion that Van Persie is not a good goal scorer. Despite everything, he scored 18 goals last season, which is good no matter how you score them. However, I think AGW is a good supplement to the top goal scorers list. Last season, Bent was consistently scoring goals that added a whole 10 points to the winning percentage than Van Persie on average.

You shouldn't base your entire assessment of a goal scorer only on AGW. However, I think its something to take in to account.

tags EPL, win probability added, win probability, average goal weight

Win Probability Added in Soccer

June 21, 2011Ford Bohrmann

Everyone hears it all the time: A 2-0 lead is the most dangerous lead in soccer. But is it really? Thinking about the led me to wonder how exactly dangerous leads were in soccer. In fact, I wanted to find out what win, loss and draw percentages a team had in all situations. The best way to find this out is to analyze a lot of games and calculate the win, loss, and draw percentages in every possible game situation. To do this, I took in to account the venue of the game (home versus away), the goal differential between the teams (team is up by 2, team is up by 1, game is tied, team is down 1, team is down by 2 etc) and the minute of the game. I took goal differentials of -5 to 5 and minutes 1-90. I thought these were probably really the most important factors. You could maybe take in to account cards too, but this is hard and makes it pretty complicated. Overall, there are 2*11*90 = 1980 combinations of game situations.

The idea relates to WPA in baseball. Basically, WPA is a measurement of how much a play adds to the chance the team wins a game. For example, how much does a 2 run home run help the team's chances in the 6th inning? In soccer, a question would be how much does a goal at home to give you a 2 goal lead in the 67th minute change your winning percentage in the game? Pretty simple concept.

To get the percentages for all of these situations I imported game data from the past 10 years of the EPL in to Excel. My Excel skills are not the best but with some help I was able to eventually get these to convert in to percentages for each game situation mentioned above. The basic idea is this: how often do teams with a 1 goal lead in the 40th minute at home win? How often do they draw? How often do they lose? This was done for every minute and every goal differential both home and away. The results truly tell us how dangerous a variety of leads are.

Here's an example: The team is away, the game is tied, and it's the 67th minute. Any guesses on the win, draw and loss percentages? Well turns out the team has about a 19% chance of winning, a 51% chance of drawing, and a 30% chance of losing.

We can also test the "2 goal leads are the most dangerous leads theory". Let's say the team is home and it's the 35th minute. Here are the percentages for 1 and 2 goal leads:

1 goal lead: win: 78%, draw: 16%, loss: 6%
2 goal lead: win: 96%, draw: 2%, loss: 2%

The same holds true for all minutes and both home and away teams. A 2 goal lead is in fact not the most dangerous lead in soccer.

I'm also in the process of making a Java Applet to post here that lets you input the goal differential, venue, and minute, and spits out the win, loss and draw percentages. Again, my Java programming talents are not the best, so no promises on anything getting finished or uploaded soon. I uploaded the actual excel files to a google sites page though if you're curious to look at other percentages. If you want to download the files click here and type in the search bar ".htm" without the quotes to find the files.

Next, I'm planning on relating this more to how WPA is used in baseball by using it to analyze specific players by calculating how much percentage they add to their team winning by scoring goals. Not sure how useful this statistic will actually be, but it's worth a shot.

tags win probability added, EPL

Contact Us

Blog

Refining The Win Probability Statistic

August 03, 2011Ford Bohrmann

WPA and AGW Weekly Updates this Season

July 12, 2011Ford Bohrmann

Win Probability Graphs and Regressions

June 24, 2011Ford Bohrmann

WPA and AGW: Van Persie is overrated

June 23, 2011Ford Bohrmann

Win Probability Added in Soccer

June 21, 2011Ford Bohrmann

Contact Us

Soccer Statistically

Blog

Refining The Win Probability Statistic

August 03, 2011Ford Bohrmann

WPA and AGW Weekly Updates this Season

July 12, 2011Ford Bohrmann

Win Probability Graphs and Regressions

June 24, 2011Ford Bohrmann

WPA and AGW: Van Persie is overrated

June 23, 2011Ford Bohrmann

Win Probability Added in Soccer

June 21, 2011Ford Bohrmann