Thursday, November 8, 2012

Back To Basics: Does the Premier League Make You Fat?

Does the Premier League make British men fat, or does male obesity help grow Premier League revenues? The data suggest it could go either way. Statistics from the NHS Information Centre's "Health Survey For England" and Deloitte's reports on clubs' financial statistics for the 1993-2010 period show that more revenue for the league has gone hand in hand with a greater proportion of males who are obese. Conversely, fewer obese men is associated with less revenue for football clubs.

When we combine the trends of financial good health and population bad health, a startling picture emerges: more obesity, more Premier League revenues. Less revenues, less obesity.

The correlation over the 15+ years is a whopping .93. It doesn't get much better than that.

But what does this mean?

Does it mean that Premier League revenue growth causes British men to put on weight? That is, does the Premier League's financial success contribute to an unhealthy lifestyle? Or could it be the reverse, that more obese British men spend more money on the Premiership, and that a greater proportion of obese men in the population translates into more revenue for the Premiership?

Odds are, it's neither. One of the most fundamental points of statistical analysis is that correlation does not imply causation. Just because two things go together - obesity and Premier League revenue - doesn't mean the two are related.

Humans are hard wired to see patterns in the world. We see patterns in our neighbors' coming and goings, our friends' affections, or we look for animal shapes in the clouds (Rohrschach anyone?). So when we rummage around long enough in a big enough dataset of football data, we are bound to find some interesting patterns and possibly some correlations - one thing, like shots, going hand in hand with another, say goals.

But before we take whatever patterns we find as proof positive that we have discovered something new and importantly, something real, let's make sure we understand what correlations between variables - congruent patterns in two types of events really mean.

Perhaps one of the hardest thing to get your head around is the idea that correlation - the fact that two things co-occur- does not mean causation - that one thing causes the other to occur.

Take another example.

What if we told you that getting caught offside produces goals for your team? Don't believe us?! According to Opta match statistics, over the past three seasons of Premier League play, teams that were offsides more often scored more. In matches where teams were offside more frequently than fifty percent of teams scored 1.45 goals; teams that were offsides less than half of all teams scored 1.18 goals, on average. So more offsides = more goals.

Seems strange at first glance - until you realize that more offsides doesn't literally mean offsides. Instead, it means teams pushing the back line of their opponents more intensely than other teams. So more frequent offsides are just an indicator for more aggressive offensive play on the part of forwards. When you do that, you score more. This too, is an example of a correlation not literally implying causation, but the correlation is consistent with what we know about the game.

But of course there are "real" correlations that are likely to be causal in the more literal sense.

Living here in the UK, one obvious everyday correlation is that between umbrellas and rain. On rainy days, there are many more people with umbrellas walking the street. On sunny days, many fewer. So there you go: a correlation between the amount of rain we can measure in a day and the amount of umbrellas we see; more rain, more umbrellas. A correlation.

And yet, the data by themselves don't tell us anything about causality - absent some kind of theory about how the two are connected. For the record, we suspect it runs from rain to umbrellas. So our theory tells us that rain causes umbrellas to be carried around; not that umbrellas cause it to rain.

So a correlation is only as useful or "true" as the theory about the underlying events we're trying to understand.

Back to round things, football and bellies, for a second. So why do we see the pattern we see?

As the collective top lines of Premier League clubs have swelled, so have Britons' waistlines, and vice versa. The NHS health data show that, over a short decade and a half, obesity among UK males has almost doubled from 13% in 1993 to 24% in 2008 (and 22% in 2010).

At the same time, the revenues of Premier League clubs have been increasing at a healthy rate for many years, as the following graph reveals.

But the fact that obesity and Premier League revenue are statistically correlated, but for no apparent reason, doesn't mean they're not related at all, albeit perhaps in a circuitous way. It could be due to a third variable we haven't considered: television. Much of the revenue growth in football has come from more lucrative TV deals for the league; and we suspect that, over the past two decades, British men - especially couch potatoes - watch more television than ever before, and therefore more football as it has become more widely available. So if we looked at the total number of hours of football on TV, we'd see a similar trend as for obesity and revenue. (We'd probably also see a similar trend for the use of personal electronic devices, but that's for another post).

The lesson: understand that patterns in data are easier to find than make solid sense of. Some are real, many aren't. Some are more real than others. And always think before you leap ahead with your interpretation of the data.