Saturday, April 30, 2011

Team Differences in Shot Creation: The First Half of the 2010/11 EPL Season

I've been spending a few posts looking at shot creation in the Premiership. One of the things I haven't spent much time on is differences across teams in in terms of who has been creating the most and least shots from different kinds of situations. These situations are open play, corners, free kicks, fast breaks, and penalties.

So here are the absolute numbers of shots created in the first half of the season per team/match from different situations in the match (shots are defined as all shots on goal; I leave out penalties since they are rare and therefore not all that interesting to look at across teams). As before, the numbers are courtesy of the Opta/Guardian chalkboards and cover the first half of this year's EPL season.

Without much ado, here are shots from open play.

Shots created from open play ranged from around 8 to about 13, on average, in the first half of the season. The ranking of teams makes all kinds of sense, given what we know about team quality. But there are exceptions: Everton is up there with the league's top teams at about 13 shots created from open play; and Man City is relatively low at around 10.

What about shots created from corners? Here we go.

Thursday, April 28, 2011

The Mystery of the Vanishing Fouls: Trends in Major League Soccer Since 1996

A few days ago, I wrote about patterns in referee decisions in Major League Soccer, with a focus on fouls and yellow cards.  In international comparison, MLS referees have called fewer fouls than other leagues, suggesting either more fair play or less involved referees (for whatever reason). Since that analysis, we have had a rough weekend in MLS, with Seattle's Steve Zakuani breaking two bones in his leg and 2010 MVP David Ferreira of FC Dallas sustaining two fractures in his ankle. While I don't have the foul statistics for the weekend or want to suggest that this weekend was particularly violent, the conversation about referee quality in MLS is worth continuing, if you ask me.

To make that conversation more productive, it's good to know some of the facts about MLS foul patterns, so I went out to look for some more (beyond the ones I already posted). In particular, I was curious to find out if MLS foul averages have always been so low or if this is a more recent development. 

Thankfully, I didn't have to look too far to find out; here are trends in fouls called in MLS provided courtesy of the Climbing The Ladder blog.* It shows the average number of fouls committed and fouls suffered per match (the fouls committed category records slightly higher numbers because not all fouls involve the other team - think handball or swearing at the referee).

Clearly, average foul totals by match are way down, and they are at their lowest levels in the history of the league. Not only that, since recording a high point in the 2000 season, when the number of fouls fouls committed and suffered was around 33 per match, they have been on a consistent and fairly steep decline to around 20 (suffered). That's a huge decline, percentage-wise, in the neighborhood of almost 40%. We're not talking short-term fluctuations but a long-term decline over the course of a decade.

What can explain this pattern?

Wednesday, April 27, 2011

Shameless Self-Promotion Of The Day: Vote SBTN For Best EPL Blog

News flash: SBTN has been nominated for a 2010-11 EPL Award in the category "Best EPL Blog (qualification round)."

The third annual EPL Awards recognizes the best of the 2010-11 Premier League season. EPL Talk has taken a close look at the Premier League season and have picked out the best footballers, managers, podcasts, blogs, radio shows, online games and more for readers to vote for.

Voting opens today and closes on Friday, April 29. The three blogs with the most number of votes between now and Friday night will then be entered into the final round of the EPL Awards for the EPL Blog category, which will launch next week.

You can vote by clicking on the picture below.

Saturday, April 23, 2011

Which Shots Are More Likely To Be On Target? Accurate Shot Creation in the Premier League

In an earlier post, I took a closer look at shot creation in the Premier League to see what number and proportion of shots are created from different match situations. The numbers, courtesy of the Opta/Guardian chalkboards and covering the first half of this year's EPL season, showed that most shots were generated from open play (11 on average per team/match). Teams also created chances from other situations, but at a much lower frequency (corners: 2.1, free kicks: .82, fast breaks: .52, penalty kicks: .14).

In terms of relative proportions, this means that roughly 75% of shots taken by Premier League teams in the first half of the season resulted from open play, 15% from corners, 5% from free kicks, and another 3.5% from fast breaks, with less than 1% resulting from penalties. Finally, there are some obvious differences across teams; to name just a couple of examples: Bolton were almost three times as likely as Liverpool to rely on free kicks to create shots, and Tottenham were twice as likely to generate shots from corners as Wigan.

So far so good; but as we well know, taking shots and shooting accurately are two different things. We can think of overall numbers of shots taken as a rough measure of offensive pressure, but real threats are only those shots that have a chance of hitting the back of the net. So the next step in understanding shot creation is whether some situations are more likely to yield accurate shots than others. Knowing the overall distributions of shot origins are helpful; for example, knowing that 75% of all shots come from open play would lead us to assume (absent any other information) that accurate shots are also more likely to be found in that category. But this could also be a fallacy: even though penalty kicks constitute fewer than 1% of all shots, it is easy to assume that most of them are on target.

This example highlights an important analytical consideration: while we may want to know where accurate shots come from - their absolute frequency - we may also want to know their relative threat - that is, the odds of any one shot being accurate. So below, I describe patterns in accurate shot creation in absolute and relative terms.

Let's start with the absolute frequency of shots on target. One thing to keep in mind: Opta's definition of shots on target includes only shots that really had a chance of going in (and we include actual goals in the count, of course). This means that we do not include blocked shots. So without further ado, here we go.

Using this way of measuring accurate and therefore highly threatening shots, it's clear right off the bat that accurate shots are a much rarer occurrence than total shots. On average, Premier League teams each managed about 4.4 accurate shots on goal per match. In total, about 3.4 accurate shots were created from open play, about .45 from corners, about .22 each from free kicks, and fast breaks, and .12 from a penalty. Another way to read these numbers is that teams produced about 1 accurate shot from a penalty every 8 matches, from fast breaks and free kicks every 4 1/2 matches, and from corners roughly once every other match.

While the overall numbers of highly accurate v. all shots considered in the other analysis are very different, their relative distributions look very similar. But to get a real sense of this, here are, first, the relative distributions of accurate shot creation (to one another), followed by the ratio of accurate shots to overall shots.

Wednesday, April 20, 2011

£ = Performance? A Statistical Look At Manchester City’s Offensive And Defensive Production Since 2005/06

Co-authored With Danny Pugsley, Editor of the fabulous Bitter and Blue blog.

It is difficult to know how much success a club’s spending on squad and coaching can guarantee, but that has seldom prevented owners and managers from trying. And occasionally analysts have done a terrific job getting good statistical results correlating transfer fees or wages with club’s league points, for example – witness Tomkins et al.’s terrific work in Pay As You Play.

City is one of those clubs for whom spending sizable amounts of money is a relatively new thing, and it finds itself in the midst of a big experiment, having gone from achieving promotion to the Premiership less than ten years ago to becoming the richest club in the world and consistently contending for trophies. The past five years, in particular, have seen a profound transformation of the club’s fortunes, so we thought it would be interesting to take a look how things have evolved on the pitch beyond points but with the help of things we can actually measure.

Perhaps the most straightforward way to divide the past 5+ years is by splitting them into a “before” and “after” period and to take a look at the club’s performance before and after Mark Hughes was installed as manager. This also roughly coincides with current ownership of the club.

Sunday, April 17, 2011

See No Evil? Fouls, Cards, and Referees in Major League Soccer

In the wake of recent discussions over the quality of refereeing in MLS and David Beckham's ability to earn yellow cards this season, I thought I'd take a quick look at patterns in MLS refereeing with data from last season. As I mentioned in my comparison of fouls and yellow cards between MLS and the big European leagues on the New York Times Goal blog, so far this season, teams in the Premier League have been called for an average of 11.3 fouls per match, while teams in the other three leagues have been called for a third more (ranging from 15 in La Liga to 16.3 in the Bundesliga). But what is most interesting about the foul statistics is that MLS has by far the lowest foul totals of any of the five leagues I compared, at 9.7 per team/match. 

In terms of awarding yellow cards, MLS refs are the least busy overall, too. MLS teams see an average of 1.51 yellows per match; compare that to the average of 2.64 in Spain’s La Liga, for instance. The numbers of yellows in the other leagues are 1.60 for the EPL, 1.67 in the Bundesliga and 1.96 in Serie A. Clearly, we see relatively few foul calls and the fewest yellow cards in the MLS.

These data are useful for putting refereeing in MLS in context. But they cannot tell us whether referee performance is similar across referees, or whether we see distinct patterns in how some referees call the game. Basically, they tell us that play in MLS is either cleaner or its referees pay less attention. I'm sure David Beckham would bet it is the latter; in fact, he apparently thinks his six year old would do a better job.

Below I try to shed a little more light on refereeing patterns in MLS with the help of data from For starters, let's take a look at the league's assignments of referees last year. Clearly, there's a huge range in terms of how many matches referees were asked to call, with a handful of referees calling about 20 matches, and some only getting to call one or two matches. Take a look at the list below.

Clearly, the league has more faith in some referees' ability to do a good job, with Ricardo Salazar topping the list. But the uneven assignment has implications for analyzing referee performance. If we want to take a systematic (statistical) look to evaluate referee performance, we need a big enough sample of matches so that we can draw more reliable conclusions from the data. So for the analysis that follows, I restrict myself only to those referees who were in charge of at least 8 matches. This leaves us with 16 referees we can compare with some degree of confidence. 

How many fouls did MLS referees call? Below is the average number of fouls they called on each team per match last year.

The numbers show a fairly narrow range across referees, from slightly over 8 fouls/team (or 16 per match) for Jair Marrufo to 12 per team (or 24 per match) for Terry Vaughn. Mark Geiger, Kevin Stott, and Jorge Gonzalez are the most average referees, so to speak, at around 9.7 fouls called on each team. For comparison purposes, in an earlier analysis of refereeing in the Premier League I found a much greater range (between 16 and 30). Bottom line: referees in MLS seem to differ little from one another in terms of how many fouls they call.

What about patterns in home-away refereeing? Believers in the home field advantage think that home teams are systematically advantaged. I have my doubts, and my analysis of refereeing in the Premiership did not confirm this. But what about MLS? Do we see some referees systematically calling more fouls on away than on home teams? The following graph shows average numbers of fouls called by each referee on home and away teams.

Thursday, April 14, 2011

Leveraging Leverage: A New Look At Performance in Europe's Top Leagues

Here's a different way of looking at positive leverage. In the spirit of analyses that have looked at teams' ability to generate and take advantage of chances in a match, it's a way to identify teams that both generate positive leverage situations and manage to, well, leverage them for a win.

Here's what the graphs show. They depict teams' average positive leverage levels this season (on the x-axis) alongside their full-time win percentages. I have superimposed lines to divide the graph into four quadrants. The lines are league averages; so falling to the right on the x-axis (leverage) means the team is above average this year; falling above the y-axis (wins) means that the team has won more matches than the average team. Because leagues' averages and team performances differ, the lines fall in slightly different places for each of the leagues (Bundesliga, EPL, La Liga, and Serie A).

So the upper right hand quadrant contains teams that both generated leverage and converted it into wins. The lower right hand quadrant contains teams that generated leverage but did not convert it into as many wins. The upper left hand quadrant contains teams that did not generate much leverage but was able to convert it when they had the chance. And finally, the lower left hand quadrant contains teams that neither generated nor converted leverage when they had it.

So what do we see? Interestingly, each league has a different leverage profile. Take a look a the Bundesliga, for example. Clearly, the top 4 teams this year all populate the upper right hand (the "good") quadrant where teams generate leverage and convert it. In contrast, underperforming teams either don't generate leverage in the first place and fail to convert whatever measly leverage they have, or as is the case with Gladbach and especially Hoffenheim, they generate it, but fail to convert. Teams that are having a good season but are not contending for the title fall in the upper left quadrant: they don't generate as much leverage as the top teams, but they are about as good at converting (though no one can touch league leader Dortmund, it seems).

Compare this to the English Premiership, shown in the next graph, where generating and converting leverage seem to go hand in hand much more readily. That is, the teams that generate leads also are the ones that convert them, while teams that have a difficult time producing leverage also don't typically convert it to a win. One other thing of note: league leaders Arsenal and Manchester United are in a league of their own (and almost off the chart) in the far upper right hand corner. This stands in contrast to the Bundesliga, where the top 4 teams are more closely clustered together.

Monday, April 11, 2011

Shot Creation in the Premier League: Data From the 1st Half of the Season

Soccer's relatively low scores compared to other team sports make the game exciting and agonizing at the same time. After all, any one score, any one action or mistake on the field can make or break a team's day. Soccer's high stakes and the rare beauty of goals have a downside for analysts, however. In particular, the infrequency of goals from an analysis perspective means that random chance is likely to play a larger role in determining goals than in events that occur more often.

What to do? In a sport where the outcomes that count are rare, it makes sense to analyze those aspects of the game that lead to those outcomes and that have the convenient property of being more common. Enter shots, since they are most proximate to the thing that matters most but that's also the most elusive. Shots also lend themselves more easily to statistical analysis, and I have looked at shots from a variety of angles over the last few months, using data from different leagues (click here for a list).

We can think about the role that shots play in a couple of different ways. A useful framework for understanding the role of shots for shaping goal creation is StatDNA's description of how goals are linked to shots. This framework conceptualizes goals as a combined function of the number of shots and the quality of shots. Another way to think about this to say that we need to think about the quantity and quality of shots.

Before we can get to the question of what kinds of shots are more likely to yield goals, however, we need to understand the origins of the different kinds of shots teams actually take. To get a handle on this, I collected data on shot creation by Premier League teams in the first half of the 2010-11 season (based on data from the Opta-powered Guardian chalkboards). The categories are straightforward: shots are classified by whether they originated from open play, free kicks, fast breaks, penalties, or corners.

So here are the average numbers of shots, split up by shot types for the league as a whole.

On average, teams took 14.7 shots per match in the first half of the season. That's slightly higher than the long-run average of the league, but not by much (the avg. is around 12-13 per team and match). What's more interesting is how these shots were distributed. The most common shots were generated from open play (11), followed far behind by shots from corners (2.1), free kicks (.82), fast breaks (.52), and finally, penalty kicks (.14).

Here's another way to look at these numbers. Rather than looking at the overall number of shots, this pie chart shows the relative proportions of each shot type relative to all shots taken.

Sunday, April 10, 2011

Leverage Leaders and Laggards: The Big 4 European Leagues

In earlier posts, I have discussed the idea of positive and negative leverage - the idea that being up or down a goal (or two or three) changes the dynamics and the psychology of a match. Statistically, I defined it as the odds of a team winning the match, given the particular score at the time and the time remaining in the match.

It shouldn't come as a surprise that leverage is important in a dynamic team sport where scoring is predictably rare. But this doesn't mean it's equally important to all teams or that all teams take equal advantage of it. Put simply, not everyone converts the same amount of leverage into the same outcome.

There are two general issues related to leverage: (1) How much of it teams generate, and (2) what they do with it. So here's an analysis on leverage across the big leagues of Europe about 25 matches into the season (depending on a club's particular schedule by the end of February). First, here are data on who has been in a particularly good position, leverage-wise, measured by the total number of matches teams were in positive leverage territory (a +1 or greater goal difference [GD]).

Looking across the leagues, it seems that Bundesliga teams are most similar to one another in terms of generating positive leverage and that we see more variation among teams in the other leagues. Eyeballing the data for individual teams, the graph reveals that better teams generate more matches with positive leverage. For example, Manchester United and Arsenal are clearly the most consistent in generating positive leverage in the EPL at around 15 matches, followed by Manchester City and Chelsea. In contrast, teams like Wigan and Stoke rarely generate positive leverage. Not a surprise, but it's interesting to see how much more frequently good teams go into the second half with wind in their sails.

In the other leagues, too, the best teams tend to be at the top, but it's not always in 1:1 correspondence to their league position. For example, Barca clearly lead La Liga on this score*, but surprisingly (perhaps), Villareal and Sociedad come in second and third ahead of Real Madrid in terms of the sheer number of opportunities they have had to take advantage of their leverage position. In Serie A, AC Milan top the league on positive leverage, while pitiful Bari had managed only one match with positive leverage at that point in the season.

Finally, the Bundesliga numbers are interesting in a couple of ways. Here, too, the top teams generate the most positive leverage, generally speaking. But what's the story with Hoffenheim who were second in the league in terms of positive leverage - something that's obviously not reflected in their position in the table. And here's another indicator of Hamburg's forgettable season: they are dead last on positive leverage.

So what about the other side of the ledger, negative leverage?

Here, the situation is predictably reversed. The best teams are best able to avoid negative leverage situations. This is also the dimension on which Real Madrid outperformed Barca 2/3 of the way through the season; in fact, topping the league along with Villareal. Dortmund does very well avoiding negative leverage (and Hamburg look much better, too, so their problem seems to have been not taking advantage of positive leverage situations). One surprise in the Bundesliga: Gladbach's relatively good ability to avoid negative leverage situations, suggesting they (unfortunately, if you ask this Gladbach fan) managed to throw away more matches than they should have this year.

In the EPL, Man U yet again leads the league - another indication of their consistently strong performance this year, with Sunderland's slightly surprising ability to avoid negative leverage situations only second to Man U (at that point in the season, and before the most recent dismal run).

So now we know who had leverage and how much, the next natural question, of course, is what they did with it? That is, whether teams convert positive leverage into a win or whether they are able to fight negative leverage. So here are the outcomes associated with the most common leverage situations, GD's of +1 and -1.

Thursday, April 7, 2011

Factoid of the Day: First and Second Half Goal Frequencies

Football statisticians have long known that scoring increases with match duration. For example, take a look at this graph from a 1998 paper by Mark Dixon and Michael Robinson in the Journal of the Royal Statistical Society based on data from over 4,000 matches collected over in the top four divisions of English football over a four year period in the early to mid-1990s.*

Source: Dixon and Robinson (1998)
It's clear that scoring increases linearly as the match wears on. And it doesn't matter if the matches end in a win for the home or away side, or even a draw.

While I don't have the kinds of detailed data Dixon and Robinson had, I do have first and second half scores from the top leagues and over a five year period to approximate scoring frequency during the match. We can use these data to see how much more frequent goals are in the second half of the match compared to the first. So here is a poor man's approximation of these match data in the form of the average numbers of first and second half goals by team and league for the past five years.

In each league and every season, teams score more goals in the second half than in the first. Across the leagues, the number of second half goals scored is between .7 and .8 (it is higher in the Bundesliga than in the other three leagues). In contrast, teams on average score slightly less than .6 goals in the first half. So doing some very rough math, second half scores are about 25% higher than first half scores.

Of course, there can be lots of reasons for this, including perhaps most obviously, that teams that are behind by 2 may not defend as hard as teams that are tied; or teams that are up by 2 are just having a better day and continue to score; or players just get tired and make more mistakes. All of these could result in some padding of results, which in the end, could produce these average differences.

If the odds of scoring simply increase because of player fatigue, then we should not see significant differences in scoring variability over time. That is, the standard deviation around the mean shouldn't increase with time, too. But it could be that something else is creating an increase in goals over time - something like leverage, which I've discussed before. Simply, if scoring increases over time and it becomes more variable, this could be evidence of tactical behavior.

So are second half scores more variable than first half scores? We can calculate the standard deviation to help us answer that question. Take a look.

Tuesday, April 5, 2011

Leverage in the EPL: Who's Made Good Use Of It This Season? (You Might Be Surprised)

A couple of days ago, I wrote about the positive leverage teams generate when they are up by a goal or the lousy leverage they have when they are down by one. I defined the leverage that puts one team in control of the match or creates pressure on another team statistically as “the likelihood of winning, given the score and the time remaining in the game.”  For the purposes of these analyses, I measured leverage by the expected odds of winning the match, given a particular score line at halftime. The calculations showed that a halftime goal difference (GD) of +1 gives teams almost 70% leverage (defined as the odds of winning the match), while GD of -1 only provides teams with a 10% leverage level. Given that a team's overall (generic) expected frequency of winning any match prior to kickoff is in the 36-37% range, being up a goal doubles the odds of a win, while being down a goal cuts it by a factor of almost 4.

These numbers are averages calculated across 4 leagues and 5 seasons to provide a baseline to compare teams against. Once we know the baseline, the next question is, of course, which teams have leveraged leverage, so to speak; that is, which teams have outperformed historical averages, and which ones have not? So here are some leverage statistics for the Premier League this season to get a sense of how teams are doing.

The data are for teams after 27 matches. Remember from the earlier analyses that GDs of +1 and -1 are particularly meaningful and where most of the statistical action is. So here are two scenarios: first, leverage from a +1 GD; second, leverage from a -1 GD. So here's first the +1 GD analysis.

The +1 GD scenario shows huge variation across the league, from 1 in the case of West Brom and Stoke to 0 in the case of Wigan. The league leaders (in this order) were West Brom and Stoke, followed by Manchester United, and then tied for third we had Sunderland, Newcastle, Man City, Chelsea, and Bolton. Considering that the long-term historical leverage in a +1 situation is .69 (so roughly 70%), the underperforming teams include (starting with the worst underperformer of the league) Wigan, Liverpool, Everton, Wolves, Birmingham, Fulham, Aston Villa, and West Ham. There are no real surprises in this group, except perhaps Liverpool.

Sunday, April 3, 2011

Leverage: What Is It And How Much Of It Do Teams Have?

In soccer, a single goal has enormous value. In fact, we can put a number on that value. I’ve written about this before in posts on the most common scores in soccer or posts about the point value of goals. In a game where over 50% of matches involve fewer than 3 goals, and the most common score line is a goal difference of 1 (rather than 3 or 4), being ahead or behind is a big deal. Another way to see this is to remember that teams that are ahead at the half have roughly a 75% chance of winning the match.

Being ahead is vital not just because it makes wins more likely, statistically speaking. It's also interesting for soccer analysts because it does something to the dynamics of the game. It can change a match on a time and how teams are playing, either to protect a lead or get back into the game. At a minimum, having one team ahead creates psychological pressure on the team that’s behind, and it puts the team that’s ahead in control of the match. Put simply, it provides them with leverage. We can also define leverage statistically as “the likelihood of winning, given the score and the time remaining in the game” (I borrowed the definition from's ingenious Gabe Desjardins).

The value of being ahead or the difficulty of being behind can thus be expressed in numbers. Here, I measure leverage by the expected odds of winning the match, given a particular score line at halftime. Because winning is about having a positive goal difference, we can calculate the expected frequency of a win, given different goal differentials. Below I show win frequency as a function of goal differential (GD), for the four best leagues in the world (Bundesliga, EPL, La Liga, and Serie A) from 2005/06 to 2009/10 combined. This large sample of matches should provide a really solid statistical indication of how much leverage top professional teams have over their opponents, depending on where they stand at that point in the match.

A tie at the half gives each team about a thirty percent chance of winning the match - it's wide open. The graph also shows that teams have zero leverage when GDs range from minus 5 to minus 2, while teams have maximum leverage (or close to it) with GDs of plus 3 to plus 5.

This also tells us that the real action in terms of leverage is in the range of minus 1 to plus 1.

Friday, April 1, 2011

Location, Location, Location: Why Some Leagues Import More Players - Wealth, Democracy, Or Good Weather?

Here's a follow-up to an earlier post where I looked at the connection between a league's quality and its level of player imports. There is a positive correlation: the better leagues import a higher percentage of footballers. But there also are some notable outliers, especially Greece, Turkey, and Cyprus. Perhaps their levels of imports are higher than the league quality would warrant because clubs in these leagues are willing to overpay for talent - to get high quality players who also could ply their trade in a better league, clubs need to pay a premium. Or perhaps there's something else that makes them attractive leagues. But what?
The correlation between league quality and imports implies that the best players will seek to play in the best leagues, but the outliers show that some players are apparently willing to make trade-offs (like we all do), maybe to forego a little bit of league quality for a little more (or a lot more) money (or something else). So the best clubs (and leagues) will have an easier time recruiting talent because they are inherently more attractive places to work for ambitious players, but there are only so many opportunities to play in the Premier League and players have to earn a living somewhere

Assuming there is a true global market for the best talent, is there anything else about a particular country that helps attract more (of the best) players? I thought I'd take a look at three factors: how wealthy a country is, how democratic it is, and where it is.

Why would these factors matter?