Friday, January 13, 2012 at 10:53AM
The idea for a scoreline visualization originally came from Devin Pleuler (@devinpleuler on Twitter). He had the idea to create a graph that represents how soccer scorelines tend to progress, representing both how often scorelines end a certain way, and how often games flow through a certain scoreline.
Using data from 1000 EPL games from the RSSSF, I've created this chart using Processing, which you can find below.
Each node (circle) represents a different scoreline that a game can end as. The diameter of the node is dependent on the number of games that have ended in that scoreline. This means bigger nodes mean more games ended in that scoreline. For example, "1-0" is the biggest node, because the most number of games ended 1-0. On the other hand, the node "0-4" (bottom right hand corner) is tiny because not many games end with the away team leading 0-4.
I should also point out how the progression works. "0-0" is the starting node. All the nodes above that represent the home team scoring first (1-0), and all the nodes below that represent the away team scoring first (0-1). The home team's score is represented by the number to the left of the dash, and the away team's score by the number to the right of the dash.
Another thing: There are scorelines that are repeated in the graph. For example, there are three "2-1" nodes. That's because there are three score progressions that can end that way: 1-0, 2-0, 2-1 is one, 1-0, 1-1, 2-1 is another, and 0-1, 1-1, 2-1 is the final one. The size of each of these "2-1" nodes represents the number of games that ended 2-1 that went through their specific progression.
Finally, you can see size of the transition lines have different weights. The weight of each line represents how many games went through that scoreline. For example, the line connecting the "0-0" node to the "1-0" node is larger than the one connecting the "0-0" node to the "0-1" node because more games start out with the home team scoring first. It is different than just the size of the nodes, because it takes in to account not just how many games ended in that scoreline, but also how many games included that scoreline. For example, if a game ended 2-1 but at one point was 2-0, this would count in the weight of the 2-0 transistion line, but not the "2-0" node. Weighting the transition lines like this is interesting because you can begin to visualize how the "flow" of scores happens in a game.