Blog Index
The journal that this archive was targeting has been deleted. Please update your configuration.
« EPL Table Visualization: A Different Perspective | Main | New Site! »

Scoreline Visualization

The idea for a scoreline visualization originally came from Devin Pleuler (@devinpleuler on Twitter). He had the idea to create a graph that represents how soccer scorelines tend to progress, representing both how often scorelines end a certain way, and how often games flow through a certain scoreline.
Using data from 1000 EPL games from the RSSSF, I've created this chart using Processing, which you can find below.
Each node (circle) represents a different scoreline that a game can end as. The diameter of the node is dependent on the number of games that have ended in that scoreline. This means bigger nodes mean more games ended in that scoreline. For example, "1-0" is the biggest node, because the most number of games ended 1-0. On the other hand, the node "0-4" (bottom right hand corner) is tiny because not many games end with the away team leading 0-4.
I should also point out how the progression works. "0-0" is the starting node. All the nodes above that represent the home team scoring first (1-0), and all the nodes below that represent the away team scoring first (0-1). The home team's score is represented by the number to the left of the dash, and the away team's score by the number to the right of the dash.
Another thing: There are scorelines that are repeated in the graph. For example, there are three "2-1" nodes. That's because there are three score progressions that can end that way: 1-0, 2-0, 2-1 is one, 1-0, 1-1, 2-1 is another, and 0-1, 1-1, 2-1 is the final one. The size of each of these "2-1" nodes represents the number of games that ended 2-1 that went through their specific progression. 
Finally, you can see size of the transition lines have different weights. The weight of each line represents how many games went through that scoreline. For example, the line connecting the "0-0" node to the "1-0" node is larger than the one connecting the "0-0" node to the "0-1" node because more games start out with the home team scoring first. It is different than just the size of the nodes, because it takes in to account not just how many games ended in that scoreline, but also how many games included that scoreline. For example, if a game ended 2-1 but at one point was 2-0, this would count in the weight of the 2-0 transistion line, but not the "2-0" node. Weighting the transition lines like this is interesting because you can begin to visualize how the "flow" of scores happens in a game.

References (10)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (2)


Great stuff!

If you have not already discovered them, I strongly recommend the books of Edward Tufte to anyone involved in using visual representations to portray or analyze statistics.

I thought of his books when I saw the scoreline tree. Tufte's first book, The Visual Display of Quantitative Information, addresses problems with using circles to represent relative quantities.

January 17, 2012 | Unregistered CommenterJoseph Seeley

Thanks for the suggestion! I actually just ordered one of his books, it seems really interesting.

I talked to Jake yesterday and he told me about your suggestion to base the sizes of the circles on the area instead of the radius. That definitely makes a lot of sense, and I'm excited to read Tufte's book

January 19, 2012 | Unregistered CommenterFord Bohrmann

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>