The draft can be a valuable tool to build a successful club in MLS. When expansion teams come into the league they are automatically given the top draft picks. The list of players that entered MLS through the draft is telling. Some of the top goal scorers: Clint Dempsey, Taylor Twellman, Edson Buddle, Brian Ching. Some of the players with the most minutes: Nick Rimando, Brad Davis, Nick Garcia, Brian Carroll. The list goes on.
Some of these players I’ve mentioned were top picks. Brian Carroll was selected 2nd overall in 2000. Taylor Twellman was selected 2nd overall in 2000. Some of the top players who were chosen in the draft were selected in the later rounds, but went on to very successful careers. Chris Rolfe was selected 29th overall. Davy Arnaud was selected 57th overall and scored 54 goals in his career.
On the flip side, there are a number of notable draft busts. Nikolas Besagno was selected 1st overall in 2005 and went on to play in only 8 games. Joseph Ngwenya went 3rd overall in 2004 to Salt Lake and scored 18 goals in his career, while Salt Lake passed over Clint Dempsey, Clarence Goodson and Michael Bradley.
This post aims to provide some context around the value of draft positions. This can be helpful for determining a fair trade (“Should I trade up to a higher selection?”) or looking at how clubs have performed in their draft selections (apparently the Rapids have done a pretty crappy job overall).
Much has been made of the inter-continental games so far this World Cup, especially considering the presence of 3 of the 4 CONCACAF countries making it past the group stages, including the US getting out of the group of death and Costa Rica going much farther than anyone predicted.
To see how various (FIFA defined) continents have done compared to past World Cup results, I used past World Cup data collected from 11v11.com. I looked at the past World Cup results (here is an example from the United States’ page http://www.11v11.com/teams/usa/tab/stats/comp/978). These results include all World Cup and World Cup qualifying games, which is what I limited my analysis to. World Cup qualifying games are a little different than World Cup games, but considering these are almost always between countries that are in the same continent, I think its OK because I drop intra-continent games anyways. What defines a continent is pretty hazy, so I just stuck with FIFA’s definitions. This means that Australia is actually a part of Asia, and some other anomalies. This division of the world is the best way to stay consistent, though. The continents I ended up using were Africa, Asia, CONCACAF, Europe, Oceania and South America.
If you want to look at the code I wrote to do the analysis (the data scraping, the actual analysis, and the visualization) head over to here https://github.com/fordb/wc-continent-headtohead
There’s nothing too crazy going on in the analysis, just a lot of graphs to look at.
Odds makers tend to do a fairly good job in sports-- While they may not be perfect, it tends to be tough to find any consistent exploitable inefficiencies. In other words, it is rare that the odds of "Liverpool winning at home", or some other event like that, are consistently over or underestimated. You may think that the odds in an individual game may be incorrect, but in the long run inefficiencies like that rarely persist. Why? Because bookies would lose money on them. If they realize they are starting to lose money, the odds are going to be adjusted to better reflect the probability of each result occuring.
While I am not really interested in betting on soccer myself, odds do provide an interesting estimate of the probability of an outcome occuring. For example, take Arsenal's home game against Chelsea this past year. Bet365 put the odds of an Arsenal victory at 2.38. These decimal odds imply that they expect the probability of an Arsenal victory to be about 42%. Taking in to account that the odds makers usually lower the payouts so that they make money, the adjusted probability of an Arsenal victory is just over 41.1%.
This is all pretty standard stuff. The odds for relatively evenly matched games like the one above are probably pretty accurate, or at least more accurate than your average person. But what about significant underdogs? What about City against Cardiff? These are a little more difficult to assess. It's clear that Cardiff is an underdog in this game, but how much of an underdog? And do odds makers do a good job of assigning implied probabilities to these lopsided games?
If you had to place a bet, at what minutes do you think the most goals are scored during the course of a soccer game? I was asking myself this exact question, so I decided to try to figure out what the answer was. If scoring is completely random we would expect the distribution of the count of goals scored to be roughly even across every minute of the game. Of course, it is not going to be perfectly distributed because of random errors, but every minute should have roughly the same number of goals, assuming the sample is large enough.
I had a hunch that this would not be the case. Specifically, my guess was that there would be more goals scored between the 85th and 90th minutes, whereas there would be fewer in the first 5 minutes of the game. To test this hypothesis, I used data from the Rec.Sport.Soccer Statistics Foundation page from 8 years of the Premiership.
There is no shortage of analysis done recently on the fact that possession statistics tend to be misleading. A while ago, I looked at
how teams with higher rates of possession in the MLS do not
tend to win more games. Similarly, the Climbing the Ladder blog on the MLS website recently did analysis and found very similar results
. Devin Pleuler (@devinpleuler
) has done even more analysis
on why possession stats are misleading for his Central Winger blog on the MLS website. On his personal blog, Devin has also looked at possession efficiency and how it relates to winning
. Even more, the 11tegen11 blog (@11tegen11
) has written about some interesting points
on how to better analyze possession. I'm sure there are even more that I have forgotten to list here, but you get the point.
After the positive comments and interest in the scoreline visualization chart I posted last week, I decided it would be interesting to do another type of data visualization. Processing, the software I've been using for these visualizations, lets you do some cool stuff with making the visualization interactive. This week, I decided to make a more complete and informative visualization of the English Premier League table.
I tried to make it as stand-alone as possible. In other words, I wanted people to understand it just by looking at it without other information. One point: its interactive in that you can scroll your mouse over a club's circle and it will give you information on them. If you are interested in more analysis and how I created it, read below.
The idea for a scoreline visualization originally came from Devin Pleuler (@devinpleuler on Twitter). He had the idea to create a graph that represents how soccer scorelines tend to progress, representing both how often scorelines end a certain way, and how often games flow through a certain scoreline.
Using data from 1000 EPL games from the RSSSF
, I've created this chart using Processing
, which you can find below.