It's been fun comparing various aspects of performance in the big four UEFA soccer leagues. It's clear that in some fundamental ways, the leagues are very similar (think about goal/shot ratios, for example). But in other ways, they're quite different, as you'll see below. So, on the topic of punishment in soccer, I've been looking some more at data on red and yellow cards in the four big leagues. In particular, I've been wondering about getting the baselines right. So, for starters, here are the average numbers of red and yellow cards (per team and match) for the 2009-10 season. As you can see, there are some significant differences across the leagues. Take a look.
As I've noted previously, the EPL and the Bundesliga are quite different from La Liga and Serie A. Refs in the former two leagues get out their yellow cards much less frequently than refs in the latter two. While teams in the Bundesliga and the Premiership see around 1.6/1.7 yellows and less than .1 reds per match, teams in Serie A and La Liga can expect about 2.5 yellow cards and around .2 reds. These are significant differences that coaches must know and think about when preparing for a match. From an analyst's perspective, an obvious difference is geographic: more yellows as you move South! But it's not clear what is driving these patterns. Are players in Spain and Italy more likely to commit fouls, fall more spectacularly and writhe on the ground more, or are refs just tougher in the southern top divisions of European football? I'd be curious to know.
One question is whether there is a connection between yellow and red cards given. You would imagine that teams that see more yellows also, by extension, see more reds - so long as we think of refereeing as consistent and punishing the more severe and repeated fouls and transgressions on the pitch more harshly than the occasional, less severe ones.
To get a sense of the connection between yellow and red cards, I collected data for all teams and all matches played in 2009-10 in each league and looked at the correlation between yellow and red cards for each team in each match. [A correlation implies a pattern in the data where higher values on one variable (say, yellow cards) go hand in hand with higher values on another (say, red cards) for a positive correlation or lower values (for a negative correlation).] I then graphed these correlations for your viewing pleasure. Take a look.
The data show a nice, consistent positive correlation between yellow and red cards in each of the big four leagues. Clearly, teams that get more yellow cards also see more red cards in these same matches. However, there's also a very interesting difference across the leagues. The strength of this statistical connection between yellow and red varies significantly. While the correlation between yellow and red is very similar in La Liga and the Bundesliga, the Premier League and Serie A stand out. The slope (that is, the steepness of the curve) is steepest in the Premiership and much steeper than in Serie A. This suggests a much tighter connection between yellows and reds in England, and almost no connection whatsoever in Italy, where the slope is very shallow. For the statistics geeks among you, the Pearson correlation coefficient tells the story. While the coefficient is .09 in the Bundesliga and .08 in La Liga, it is .14 in the EPL and almost zero (.03) in Serie A.
These correlations are not very strong, but they are positive (as they should be). This tells us that getting a yellow in the Premier League puts teams at significantly greater risk of receiving a red in the same match, and at somewhat greater risk in the Bundesliga and La Liga. In contrast, this risk is almost completely unrelated to yellow cards in Serie A. Think about it: a team's risk of getting a red card is virtually unrelated to its getting a yellow card.
Any idea why that may be? Let me know!