Thursday, April 7, 2011

Factoid of the Day: First and Second Half Goal Frequencies

Football statisticians have long known that scoring increases with match duration. For example, take a look at this graph from a 1998 paper by Mark Dixon and Michael Robinson in the Journal of the Royal Statistical Society based on data from over 4,000 matches collected over in the top four divisions of English football over a four year period in the early to mid-1990s.*

Source: Dixon and Robinson (1998)
It's clear that scoring increases linearly as the match wears on. And it doesn't matter if the matches end in a win for the home or away side, or even a draw.

While I don't have the kinds of detailed data Dixon and Robinson had, I do have first and second half scores from the top leagues and over a five year period to approximate scoring frequency during the match. We can use these data to see how much more frequent goals are in the second half of the match compared to the first. So here is a poor man's approximation of these match data in the form of the average numbers of first and second half goals by team and league for the past five years.


In each league and every season, teams score more goals in the second half than in the first. Across the leagues, the number of second half goals scored is between .7 and .8 (it is higher in the Bundesliga than in the other three leagues). In contrast, teams on average score slightly less than .6 goals in the first half. So doing some very rough math, second half scores are about 25% higher than first half scores.

Of course, there can be lots of reasons for this, including perhaps most obviously, that teams that are behind by 2 may not defend as hard as teams that are tied; or teams that are up by 2 are just having a better day and continue to score; or players just get tired and make more mistakes. All of these could result in some padding of results, which in the end, could produce these average differences.

If the odds of scoring simply increase because of player fatigue, then we should not see significant differences in scoring variability over time. That is, the standard deviation around the mean shouldn't increase with time, too. But it could be that something else is creating an increase in goals over time - something like leverage, which I've discussed before. Simply, if scoring increases over time and it becomes more variable, this could be evidence of tactical behavior.

So are second half scores more variable than first half scores? We can calculate the standard deviation to help us answer that question. Take a look.


And indeed it is. Generally, score lines are more variable in the second than in the first half. Second half scores are least variable in Serie A and most variable in the EPL. So second half score lines include more frequent scores but they are also slightly less predictable as the match wears on.

As always, averages tell us what all teams are doing. So how do individual teams compare? Here are the data for the EPL this season, ordered by overall goal totals per match.


A couple of general patterns: almost every Premier League team this season has scored more in the second compared to the first half (on average, of course). The lone exception: Manchester City who perform well in terms of overall goal totals, but who also score more in the first than in the second half (and Blackpool's scores are identical across the halves). Clearly, the league leaders Man United and Arsenal do an amazingly consistent job, scoring slightly more than one goal each half and topping the league in goal averages per half and match. Chelsea aren't far behind, but behind they are.

A few others stand out. Bolton and Stoke have dismal first half averages but amazing second half scores. In fact, Bolton are third in average second half goals scored after Man U and Arsenal. And while Stoke have scored the fewest goals in the first half, they are among the very best in the second half of a match.

So does this mean that teams that haven't scored in the first half are generally more likely to score in the second half, or that teams that have scored one goal in the first half are more likely to score in the second half, and so on? Unfortunately, these numbers can't tell us that, so that'll have to wait for another day. In the meantime, I'm curious to know why Bolton and especially Stoke perform like different teams before and after the halftime break. Wonder what they put in their energy drinks in the locker room?!


* Dixon, Mark J., and Michael E. Robinson. 1998. A birth process model for association football matches. Journal of the Royal Statistical Society: Series D (The Statistician), 47: 523–538.