Blog Index
The journal that this archive was targeting has been deleted. Please update your configuration.
Navigation
« Expected Points Added (EPA) Data Through EPL Week 1 | Main | Does More Possession=More Wins in the MLS? »
Wednesday
Aug032011

Refining The Win Probability Statistic


Last year I was planning on going to go to the Sloan Sports Conference but ended up not being able to make it. I was thinking about it again this year, and I decided it wouldn't be a bad idea to submit something for this year’s conference. At first I wasn’t going to, but why the hell not? Might as well go for it, I guess.

My win probability added statistic has generated some interest for people, and I think it gives some pretty interesting insight, so I’ve been working on expanding it. If you have no idea what win probability added is, check out my first post on win probability and another on win probability added. Anyways, thus begins my quest to refine and expand the win probability added statistic for submission to the sports conference. To make it a lot better, comments, criticisms, and suggestions are very much appreciated and would help a lot.

The first fix I made was change the name based on a simple fix. The problem with “win probability added” is that it doesn’t necessarily calculate the win probability added. That’s a little bit problematic. For example, if two teams are tied in the 90th minute, the win probability under my old calculations was .333 for both teams. This doesn’t really make sense, because each team has close to a 0% chance of winning the game, not 1/3. This comes from modeling the statistic after the similar calculation in professional baseball. My fix for the problem is extremely simple: multiply all the values by 3. This changes the statistic from win probability added, to the expected points added. It basically makes much more sense now. If a player scores a go ahead goal in the 90th minute, the Expected Points Added (easier to write EPA from now on) is going to be almost 2. If a player scores a tying goal in the 90th minute the EPA would be almost 1. Much simpler and easier this way (originally got the idea from @11tegen11’s similar analysis).

After this, I noticed the graphs were not nice easy curves. Even though I took a big sample size of games (about 10 years worth) there isn’t enough data to give a nice curve. To fix this, I just created lines of best fit for each game situation. The home and away graphs for each minute and goal differential are below. Before there were a few situations that didn’t give a realistic expected point total because there were so few game situations (like a 2 goal lead in the 5th minute). Making the nice smooth curves fixes this problem. It also allows me to use equations to calculate EPA instead of the annoying process of referencing a massive excel chart.





I think there’s a lot of possible paths to take from here. I’m going to recalculate the top goal scorer’s EPA using the equations. It won’t change much, but it’ll be nice to have some continuity because I’ll be calculating EPA week by week for every goal next EPL season.

I’m also working on creating a database of the top goal scorers in the last 10 years in the EPL with their goal totals and their EPA over the years. Looking at goals and EPA over time will hopefully give some insights in to clutch (or lack thereof) goal scoring. If some players consistently have very high EPA’s and some players consistently have low EPA’s, it could be an indicator of clutch goal scoring in football.

Like I said before, I’d love comments and suggestions on ideas for where to go next on the blog, via Twitter, or even email. 

References (10)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    Response: grand seiko 中古
    seiko 3b21
  • Response
    UGG Boots get incredibly nicely identified for getting the makers of high quality footwear
  • Response
    The property and away charts for each moment and objective differential are below. Before there were a few circumstances that did not provide a genuine predicted factor complete because there were so few activity situations
  • Response
    Your blog is super you know his work. You helped me a lot in napymanni article. Thank you for this beautiful day.
  • Response
    On my friend's blogs they have added me on their blog rolls, but mine always sits at the bottom of the list and does not list when I post like it does for others. Is this a setting that I need to change or is this a choice that they have ...
  • Response
    see here for best information on Domains Snippets anywhere
  • Response
    Response: que significa seo
    click here for best en pdf around
  • Response
    Response: start investing
    SoccerStatistically - Blog - Refining The Win Probability Statistic
  • Response
    SoccerStatistically - Blog - Refining The Win Probability Statistic
  • Response
    SoccerStatistically - Blog - Refining The Win Probability Statistic

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>