Thursday, July 21, 2011

Big Mo, Little Mo, Or No Mo? Evidence of Momentum in Football Matches

A couple of months ago, I made the point that football matches see more goals in the second than the first half. That is, the odds of either team scoring a goal increase over the course of the average match. I forgot about that fact until a few days ago, when I was talking to a friend about the hot hand phenomenon in basketball.

Now, mind you, the hot hand phenomenon has been thoroughly debunked, starting with the study by my colleague Tom Gilovich and collaborators over 25 years ago (there's also a nice summary of the hot hand phenomenon for interested readers in Moskowitz and Wertheim's Scorecasting). But the conversation did get me thinking a little more about momentum in football matches.

The fact that scoring increases over the course of a match that doesn't tell us whether teams' ability to score in the first half makes it more or less likely that they will score in the second half. In fact, you might think it could go either way: if the odds of scoring increase with every minute, then you might assume that teams that failed to score in the first half would become more likely to score in the second half (since they haven't yet made good on the overall statistical tendency). But alternatively, if you are a believer in the hot hand idea, you might think that teams that score early become more likely to score again. If teams get into a rhythm of playing (and then scoring), this would mean that we should be able to see something akin to a "hot foot" phenomenon (or perhaps momentum at the team level) in soccer.

To see if there is momentum in football, we would ideally like to have data for each goal's timing across the course of the match. I don't have these data handy at the moment, but perhaps the next best thing; namely, first and second half goal totals for each team and match. To make things systematic, I used as much data as I could quickly put together. So using match data for 5 seasons of the 4 biggest leagues in Europe (from 2005/06 - 2009/10), I wanted to see if the number of second half goals teams score go up, go down, or stay the same depending on how many goals they scored in the first half.

Keep in mind that the average team across the four leagues scores .58 goals per match in the first half and .74 goals in the second half. There is a slight bit of variation across the leagues, as the following graph shows. Overall, there was more scoring in the Bundesliga than in the other three leagues, but each league sees a difference of between .14 and .18 between the halves.

But back to hot footedness. If there is something to the momentum idea, then the number of second half goals should increase with the number of first half goals. But if not scoring early makes scoring later more likely, then we should see the reverse: fewer first half goals should be associated with more second half goals. And if it's all, well, humbug and logically flawed, then we should see random patterns across leagues. So here's what the data look like when we calculate the number of second half goals in a match by the number of first half goals a team managed to score in the same match. Judge for yourself.

If you ask me, the picture shows something that looks like momentum in three of the four leagues. With the exception of Serie A (which had some funny looking scores in the mid-2000s, you may remember), there is an upward trend in second half scoring depending on how many goals a team scored in the first half. Teams that score a goal in the first half score more goals in the second half than teams that did not; teams that scored two in the first half score more than teams that scored just one, and so on. Keeping in mind that there are very few matches that saw four or five first half goals, it is interesting to see that there is a general upward trend across the board from 0 to 3 first half goals in the Bundesliga, the Premier League, and La Liga. The trend is most pronounced in the EPL, but also noticeable in the other two. In slight contrast, in Serie A we also see that teams that scored 2 in the first half score more than teams that scored none or one. However, teams that scored 3 or 4 in the first half (admittedly only a small fraction of teams - 2.13% to be exact) score less than teams that scored between 0 and 2.

So how do we explain this? Is this simply an expression of dominant teams that are able to score 2 goals in the first half being able to add to this in the second half? Not necessarily. Just for fun, I picked out four teams, two great and two less great ones: Chelsea, Man United, Sunderland, and Wigan. Here's what their first and second half totals look like.

This graph shows that there are indeed interesting differences across teams; but 3 of the 4 teams, the two less successful teams included, have a pattern of scoring more later if they've scored earlier in the match - though of course the goal totals differ between Chelsea and Sunderland, for example. The curious exception is Manchester United, which has the opposite pattern. During the five seasons included in the data here, United were less likely to score in the second half, the more they had scored i the first half.

Not sure what to make of it all, but I thought it was a fun "soccer by the numbers" thing to share and think about. What's interesting about it to me is the fact that it goes against research, which suggests that teams become behave more defensively when they're ahead, and that teams become more likely to score when they are losing. How we may be able to resolve these inconsistencies is an interesting question I hope to get to before too long.