Valuing The MLS SuperDraft
Ford Bohrmann
The draft can be a valuable tool to build a successful club in MLS. When expansion teams come into the league they are automatically given the top draft picks. The list of players that entered MLS through the draft is telling. Some of the top goal scorers: Clint Dempsey, Taylor Twellman, Edson Buddle, Brian Ching. Some of the players with the most minutes: Nick Rimando, Brad Davis, Nick Garcia, Brian Carroll. The list goes on.
Some of these players I’ve mentioned were top picks. Brian Carroll was selected 2nd overall in 2000. Taylor Twellman was selected 2nd overall in 2000. Some of the top players who were chosen in the draft were selected in the later rounds, but went on to very successful careers. Chris Rolfe was selected 29th overall. Davy Arnaud was selected 57th overall and scored 54 goals in his career.
Nikolas who?
On the flip side, there are a number of notable draft busts. Nikolas Besagno was selected 1st overall in 2005 and went on to play in only 8 games. Joseph Ngwenya went 3rd overall in 2004 to Salt Lake and scored 18 goals in his career, while Salt Lake passed over Clint Dempsey, Clarence Goodson and Michael Bradley.
This post aims to provide some context around the value of draft positions. This can be helpful for determining a fair trade (“Should I trade up to a higher selection?”) or looking at how clubs have performed in their draft selections (apparently the Rapids have done a pretty crappy job overall).
| Club | GP | Mins | Goals | Years | Picks | 
|---|---|---|---|---|---|
| Portland Timbers | 21.83 | 1715.33 | 3.17 | 3.00 | 2 | 
| Miami Fusion | 17.70 | 1544.61 | 0.24 | 3.67 | 9 | 
| MetroStars | 15.24 | 1179.28 | 1.60 | 4.74 | 35 | 
| Philadelphia Union | 17.00 | 1169.65 | 1.68 | 3.78 | 9 | 
| Toronto FC | 14.70 | 1110.70 | 0.91 | 2.70 | 20 | 
| Chivas USA | 15.06 | 1095.91 | 0.87 | 2.35 | 20 | 
| Houston Dynamo | 14.18 | 1018.77 | 1.09 | 4.12 | 16 | 
| Kansas City Wizards | 13.67 | 1005.15 | 0.97 | 3.88 | 57 | 
| Los Angeles Galaxy | 13.85 | 996.64 | 1.29 | 5.23 | 65 | 
| D.C. United | 13.31 | 984.91 | 0.94 | 4.11 | 57 | 
| F.C. Dallas | 13.95 | 970.85 | 1.15 | 3.73 | 33 | 
| New England Revolution | 12.68 | 950.84 | 1.40 | 3.50 | 62 | 
| Columbus Crew | 13.41 | 937.60 | 1.42 | 4.02 | 57 | 
| Real Salt Lake | 12.72 | 910.79 | 0.78 | 4.10 | 20 | 
| New York Red Bulls | 12.67 | 879.00 | 0.75 | 3.39 | 18 | 
| Chicago Fire | 12.61 | 855.73 | 1.14 | 4.19 | 68 | 
| San Jose Earthquakes | 11.84 | 844.45 | 0.62 | 4.50 | 48 | 
| Dallas Burn | 11.48 | 803.24 | 1.24 | 3.27 | 33 | 
| Colorado Rapids | 10.98 | 729.43 | 0.71 | 2.22 | 55 | 
| Vancouver Whitecaps FC | 9.07 | 588.27 | 0.20 | 3.75 | 4 | 
| Tampa Bay Mutiny | 8.25 | 445.35 | 0.70 | 1.54 | 13 | 
| Seattle Sounders FC | 8.53 | 438.61 | 0.71 | 3.17 | 12 | 
| Sporting Kansas City | 4.67 | 91 | 0.00 | 1.00 | 3 | 
Relating to the second point, some clubs have done better than others in the draft historically, as can be seen in the table above. Just defining “better” is difficult. As I’ll explain later, in this post I’ve used a number of different metrics to define the value of a draft pick.
“How many minutes should I expect a 3rd round pick to play in his career?”
This post is not meant to try to predict which players will succeed and which ones will fail. I’m not trying to determine if Cyle Larin will be better than Khiry Shelton. Instead, I just want to look at how players have performed statistically by their rank in the draft. This way we can answer questions like “How many minutes should I expect a 3rd round pick to play in his career?” or “How many goals do I expect a forward selected 8th overall to score?” or “How many years can we expect a GK selected in the 1st round to play?” These are forward-looking questions with answers that are backwards-looking. However, I think it’s a reasonable assumption, or at the very least a good place to start, when trying to estimate the performance of draft picks into the future.
DATA GATHERING
In order to do this type of analysis there is a bunch of data that I needed to get. All the scripts I used to collect these data are available in the repository I created on GitHub. Any updates I make going forward will be pushed into this repo. There’s a bunch of code in there that I’ve written to scrape data from online, but for this I’m focusing on the mls_draft folder and the mls_scrape.py file. I’m not going to go too much into how I scraped the data, but if you want to learn more about the process then feel free to send me an email.
First, I needed the past SuperDraft selections by teams. These data were all available on Wikipedia. The code used to scrape the Wikipedia pages, much of which was based on this blog post, is in the mls_draft.py file in the the mls_draft folder. Once the data was scraped and saved to a csv file, the create_draft_stats.sql script uploads the data to the MySQL database I set up. With the draft data all set, I also needed the actual career statistics of all MLS players. The mls_scrape.py script does this. There was a bunch more stuff to do with data cleaning that’s not very exciting (fixing player names, making sure the data is correct, etc.). Eventually I ended up with the necessary data all cleaned and looking pretty. If you see anything thats incorrect in the data let me know; Most likely I missed some stuff.
In order to ensure that the results are reliable, I only looked at draft selections from 2011 and before. Draft selections from 2012 and after don’t have enough time to come to fruition and the data is messy and not very informative as a result.
ANALYSIS
The first question I was interested in investigating concerned how often selections can be considered successful by round. In other words, if I have a 3rd round draft pick, what is the probability that the selection will be considered a success? As I alluded to earlier, defining success is difficult. Ultimately, I settled on a simple metric: minutes per year. The question of which threshold to choose is definitely an open one. In fact, there is really no right answer, so I tried a couple different values for determining a success.
I first looked at the median minutes per year. The idea is pretty simple: A selection is a success if they end up playing more minutes per year than 50% of draft selections.
| Round | Success % | 
|---|---|
| 1 | 85.9% | 
| 2 | 65.0% | 
| 3 | 38.0% | 
| 4 | 33.8% | 
| 5 | 14.5% | 
| 6 | 14.8% | 
The median minutes per year is slightly under 150 minutes per year. This actually seems like a pretty good threshold; assuming 34 games per year, that’s about 5 minutes per game, per season. So really we’re saying a success in the draft is a player that ends up being at least a regular sub. To the left are the percentage breakdowns for each round, i.e. what percent of players drafted in each round ended up playing more than 150 minutes per year on average.
An astute observer might say that teams are expecting more than 5 minutes per game from their selections. In fact, the density plot above suggests another possible threshold for defining success. There seems to be a kink right around 400 minutes, marked in the plot below.
| Round | Success % | 
|---|---|
| 1 | 75.0% | 
| 2 | 49.7% | 
| 3 | 27.2% | 
| 4 | 22.8% | 
| 5 | 10.9% | 
| 6 | 11.1% | 
At that mark the density plot flattens out. Let’s use the 400 minute threshold instead. To the right are the percentage breakdowns for each round with the new threshold of 400 minutes per year on average.
Whatever threshold is used, there is a clear picture of how picks tend to perform based on round. The 1st round is clearly the best and with each subsequent round the probability of being deemed a success decreases. While I’m clearly not going to win any awards for that conclusion, it is at least interesting to see exactly how much the likelihood of a player becoming a success decreases. Interestingly enough, there doesn’t seem to be much of a difference in the success probabilities of the 5th and 6th rounds, although there is a good chance this is due to a small sample size, considering a lot of years the draft has had fewer than 5 rounds (in fact, there were only 3 in 2011 and 2 in 2013 and 2014, although I only use years 2011 and before for this analysis).
We can do even more interesting analysis with these data. It’s possible to model the expected value of various statistics as a function of the pick number. Even more, we can break this down by position and answer a bunch of different questions. If a team selects a forward 10th overall, how many goals should we expect them to score in their career? What about the expected number of minutes per year for a defender selected 25th overall?
To answer these questions, I use LOESS. I used LOESS previously to smooth out the raw outcome probability curves for the outcome probability calculator. LOESS, which kinda sorta stands for LOcal regrESSion, is useful here for a couple of reasons: First, it’s simple. It’s as easy to use as a linear regression. Second, its non-linear and is useful for smoothing out curves. The curves I’m plotting and looking to model are a bit whacky. LOESS is a simple way to get an accurate representation of the data. The biggest drawback is probably that it is non-parametric, which means that you can’t use an equation to model the data. But this isn’t the end of the world in this case.
Below is a plot of pick number overall versus the number of total minutes played. The most clear takeaway is that there isn’t very much of a pattern to the data. Yes, there is a general trend downwards as the pick number increases, as we can see in the LOESS curve in blue. But even general might be too strong of a word here. There is really a lot of noise around that curve.
The same lack of a trend is evident if we look at the pick number versus number of goals for Midfielders and Forwards only. It’s actually even worse, as the tail on the distribution of goals is very long. In fact, the distribution of goals scored seems to follow a Power Law (another plug, apologies). Seeing a lack of a relationship between pick number and both minutes and goals scored is honestly not that surprising. It’s very difficult to predict a player’s success based on their play from before the draft. It’s important to keep that in mind for the analysis ahead.
Using these LOESS curves, its possible to determine some “fair” draft pick trades. Let’s say you’re in a club’s front office and you have the first pick in the draft. What is a trade you can make that would be equitable, at least in terms of historical data? The expected minutes for the 1st overall pick is 10,844. If you were offered both the 11th (6,407 expected minutes) and 18th (4,390 expected minutes) overall picks, you’d expect a total of 10,797 minutes, which seems like a pretty fair trade.
What about if you’re looking for a goal scorer? Let’s say this year you have the 5th overall pick. The expected goals for a Midfielder/Forward for 5th overall is about 16. For 13th overall it is 10.3 and for 25th overall it is 5.6, yielding a total of around 16 also.
This could be applied to trades of draft picks for current players, all other factors not included. With the 15th overall pick you would expect 5,123 minutes. So if you think the current player you are trading your 15th overall pick for is worth about that, you should be willing to make the trade.
There are a bunch more ways to break these types of trades down. Instead of going through more examples like above, I’m just going to post the results from the LOESS models for both minutes and goals by pick number below. Feel free to download and play around with the data.
CONCLUSION
With all this analysis in mind, I think its important to go back to a point I made at the beginning of this post. Nowhere here am I trying to make predictions about specific players. Instead, this is just an overall look at how draft picks have fared. Going back to my original example, if you think Khiry Shelton is going to break MLS scoring records, whether it be because he’ll fit into your system well, or you’re smarter than everyone else, or he’ll play well in NYC, or whatever else, then you should overvalue him compared to how my model predicts he will perform based on his draft pick. That being said, I hope this post at least serves as a solid starting point, or at least comes into the discussion, when valuing draft picks. The model is simplistic, but it provides good context around the expected value of draft picks.
 
                   
             
             
             
            