Thursday, March 31, 2011

Costing Out The German Miracle: 600 Million Euros Since 2001

These days, the "German Model" of developing exciting young players is being celebrated, among others, at this week's soccer trade show Soccerex in Manchester. On a panel titled “One Common Goal – the German Model For Success” German National Team Manager Oliver Bierhoff, Deutsche Fußball Liga CEO Christian Seifert, and Director of Sport at FC Bayern Munich Christian Nerlinger are scheduled to discuss what has led to the turnaround in Germany's soccer youth development schemes.

How have the Germans done it? Turns out, there is no such thing as an overnight success. As a story in German papers revealed yesterday, it's been 10 years since the 36 clubs of the 1. and 2. Bundesliga (Germany's top two divisions) decided to require clubs to operate a youth academy if they wanted to be licensed. And so far, the academy systems has been a success. Currently, 5,400 young players are playing in the academies, and 19 of 22 players currently in Germany's national team were trained in one of the academies.

So how much has it cost the clubs and the leagues? A cool €600 million. Watching Germany play at the World Cup last year? Priceless.

Tuesday, March 29, 2011

Measuring A Football Team's (In)Consistency: Some Data From the Premier League

There has been lots of talk this year that Manchester United aren't playing the most beautiful or inspiring football, but that they owe their table position instead to playing well week in and week out. Really well, mind you, at least according to some of the metrics I've looked at, but they're not necessarily no.1 in every category.

A picture of consistency
Measuring team performance is tricky business. So what would be a good way to get a handle on whether teams like Man United reliably perform at a high level? It strikes me that you want to see a combination of two things: high levels of team performance, but with low variation around that high level. For example, you want your team to be able to generate, say, 12-14 shots in a match, but you also want them to do that every time, rain or shine, home or away, no matter what. It doesn't take much to imagine that teams that can do these two things are likely to contend for the title. In contrast, teams that have too many ups and downs - a glorious 5-1 victory followed by an agonizing 0-3 - will not be able to win points every time as the league leaders typically do.

Statistically speaking, what we want to do is put a number on the level of performance as well as the variation around that level of performance over some period of time. For starters, imagine a distribution of performance over a couple of months or a season. Offensive production in the form of shots can serve as an example. Here's the distribution of shots in the EPL in the 2009 season by team and match. It shows you how often teams performed in the range indicated.

Average team performance in the 2009/10 season was 12.2 shots per match. But what this number doesn't tell you (but the graph above does) is that there also was considerable variation around that mean. While performances in the center of the distribution were most common (in the 7-13 shot range), there also were a good number of times when teams hardly could hardly shoot straight or when they shot their opponents' lights out.

To see if teams' performance levels were dependably in a high or low range or all over the place, we can calculate a statistic called the standard deviation.

Monday, March 28, 2011

Rogerio Ceni, SBTN's Hero of the Day: 100 Goals for Sao Paulo Goalie

38 year old FC Sao Paulo goalkeeper Rogerio Ceni is my hero of the day. He managed to score his 100th goal that helped Sao Paulo win the local derby against Corinthians 2-1 on Sunday. Here's the goal, in case you haven't seen it.

But what makes this achievement particularly remarkable is that it makes you think differently about goalies or the game more generally. There is no reason the goalie position (or any position for that matter) has to be played the way it is currently played or has traditionally been played. Remember, there always has been innovation in soccer. In fact, innovation is what gives teams an edge, whether it be inverting pyramid tactically or playing totaal Voetbal. So aside from being a brilliant individual achievement, Ceni's play wins today's innovation award.

Over the years, Ceni has scored 56 of his 100 goals by free kick; the 44 others were penalties. If you're curious to see them all, here they are.

Enjoy and congrats to Ceni (and all innovators in all things soccer)!

Sunday, March 27, 2011

Mayday, Mayday: A Quick USA-Argentina Recap

I wasn't going to write a post on this match, but feel compelled to after seeing it live last night at the Meadowlands. This is one where the box score kinda sorta tells you what happened:

Stats Summary: USA/ARG 
Shots 7/13
Shots on goal 4/6
Saves 5/3
Corner Kicks 2/6
Fouls 12/9

Honestly, I didn't even realize that the U.S. had seven shots - I think that's a generous count. It felt more like 1-15. And I agree with much of what  Grant Wahl had to say in a quick post-game analysis, but I also thought it was sort of beside the point. Whether the U.S. play 4-2-3-1 as they did in the first half or 4-4-2 as they did in the second (when the Argentines started coasting a little), the U.S. just didn't have the personnel on the pitch to really be in the game. Honestly, seeing it up close to the pitch, it was dismaying how easily the U.S. was outplayed by Argentina. Mind you, Argentina did have the world's best player on the field, and they were set up to play through him. But aside from Messi's otherworldly skills, what was most amazing was how technically and tactically unsophisticated the U.S. looked compared to Argentina. And, mind you, the pitch looked slow and seemed to lack any kind of bounce (which should have helped the U.S.).

So what to do? Sure, the U.S. is still a good bet to qualify again for the next World Cup, but playing like this and selecting players like the ones we saw last night will not move U.S. soccer forward. I would plead for something more radical. I'd keep about half the team and try to build a new squad around them with younger players who have the potential to be great.

Who should stay? Howard, Donovan, and Dempsey, perhaps Bocanegra.

Who should definitely go? Last night convinced me that the market for players is pretty efficient. There's a reason most of the U.S. guys play in second tier leagues. I don't think scouts out there are discriminating against American players. Instead, they've taken a look and taken a pass. Several guys just looked out of their depth, including Onyewu (Messi won a header against him!), DeMerit (just doesn't have the skill), Edu (lots of bad first touches), and Altidore (whose primary attribute seems to be his athleticism and size, rather than footballing skill and instinct).

The others? You could make an argument either way (based on lots more data than the one match), but Bradley and Jones seemed solid (they played their positions well, tactically speaking).

Two highlights: Agudelo and Chandler.

The biggest obstacle to improving U.S. soccer? Honestly, I can't believe it's primarily the player pool. Strikes me someone should think about whether it made sense to renew Bob Bradley's contract last year. Or at least, maybe he should give Jogi Loew or Juergen Klopp a call.

Friday, March 25, 2011

He Shoots, (He Shoots Accurately), He Scores: Goals and Shots in the Premier League This Season

I thought I'd provide a quick and ready update on offensive production metrics this season in the EPL. Here are the overall averages per team/match, ordered from highest to lowest scoring teams, as of late February, 2011.

You can see that Man U and Arsenal lead the pack in producing goals (and Chelsea and Man City not too far behind); Birmingham brings up the rear, along with Wigan, Wolves, Fulham, and West Ham. But you can also see that there is lots of variation across clubs in the average number of total and accurate shots taken. Chelsea and Arsenal lead the league in overall shots as well as numbers of accurate shots. Birmingham comes in last on both counts, while Wigan, Fulham, and West Ham perform much better on these dimensions. In terms of overall shots, Man U shoots as often and as accurately as Tottenham, Everton, and West Brom (give or take).

So how do they end up on top?

Thursday, March 24, 2011

Graph of the Day: Goal To Shot Ratios For Teams in European Leagues

This graph allows you to gauge one aspect of teams' offensive production with the help of the Reep ratio (aka the goals to shots ratio). Across the four leagues covered here, the average is currently .108 (or about 1 goal in 9.25 shots). With a median of .091, 50% of teams need at least 11 shots to score 1 goal, while the other 50% need less than 11 shots to score.

Many teams are where you'd expect them to be. But not all are; here are a few interesting outliers:

Blackburn, Blackpool, and Stuttgart all have higher ratios than one might predict based on the league table. Dortmund is quite a bit behind Leverkusen (who are having a great season, too), and Chelsea is surprisingly mediocre on this metric. Other clubs that are low, considering their league table: Lazio, but also Fiorentina, Bologna, and Genoa. These latter clubs highlight the low performance of Serie A this year compared to the other leagues. My favorite nugget: see if you can find Wolves and Real Madrid.

Obviously, a lot goes into explaining where teams stand on this score and how it relates to matches won and lost, but I thought it'd be fun just to take a look and start wondering why teams are where they are at this point in the season.

Wednesday, March 23, 2011

The Value of Clean Sheets: A Season Update For The English Premier League

Clean sheets are a wonderful thing. As I've noted in a couple of previous posts, they're wonderful because they guarantee a team at least one point from a match and potentially gives it three. In previous analyses of data for the EPL from the 2009-10 season, I reported that clean sheets produced about 2.5 points per team and match, on average. So where do we stand on this important marker of defensive performance at this point about two thirds into the season?

First things first: how many clean sheets have teams produced? As of February 23, teams had played a total of 268 matches (and therefore we have 536 observations for individual team outcomes). Of the 536 team performances, 139 ended up with a clean sheet and 397 without. This means that teams were able to keep a clean sheet one quarter (25.93%) of the time, and three quarters (74.07%) of the time they gave up at least one goal. So it's not a rare occurrence, but certainly nowhere near the majority of the time.

But, as we know, this average can hide important variation across teams. If you're managing Chelsea or Wolves, you want to know where your team stands on this score, both relatively and absolutely. So without further ado, here is the frequency of clean sheets among Premier League teams as of Feb.23, 2011.

Clearly, there's a huge range in teams' ability to produce clean sheets. Contrast Chelsea and Man City, which earned clean sheets in about 45% of the matches they played (46% and 44%, to be precise) with West Brom, which managed a measly 3.7% (both Chelsea and Man City had managed 12 clean sheets; the difference is in one more match played by City).

The Top 5 clean sheet producers so far this season have been
Man City
Man U, and
(tied for 5th) Liverpool, Fulham, and Arsenal.

The Bottom 5 include (from the bottom):
West Brom
West Ham
Bolton, and
(tied for 5th from bottom) Aston Villa, Blackpool, Tottenham, and Wigan.

So clearly, part of Sunderland's (and to some extent Fulham's) ability to generate points has been their ability to produce good defense, while Aston Villa's trouble seems to lie in the defensive area (and to some extent Tottenham's does, too).

All of this assumes that clean sheets are clean sheets are clean sheets. But they're not; turns out, they're of differential value for different teams.

Sunday, March 20, 2011

Destroying the Soul of Soccer, One Statistic At a Time ...

I've gotten lots and many different kinds of reactions to my guest post on the New York Times Goal soccer blog. So I thought I'd say a few things about the issues raised by people who care enough to comment.

First of all, thanks to everyone for reading and going to the trouble to write in, either on the Times comments section or to me personally. I don't agree with everything said (surprise!) - hence this post - but I'm glad people are willing to engage.

In general, there seem to be four groups of issues (and I'm paraphrasing a bit here). I would categorize readers' reactions as follows, from most negative to most positive:

1. You can't quantify soccer.

Though it's an easy criticism to lodge, it's a complex issue. It's also the one I disagree with the most. It basically says that soccer analytics is pointless. To exaggerate (perhaps only a little), it's saying that understanding soccer is a little bit like having a religion; you have to be dipped in the holy water to understand it, or you have to have been raised in "insert European or Latin American country here"; otherwise you just don't "get it."

I disagree, and here's why: there's nothing inherent in soccer as an organized activity that makes it immune to systematic observation and analysis. Soccer is fundamentally a group activity - no less, but also no more. It takes place during a measurable period of time, participants' actions are governed by a set of rules, and it generates human activity that can be observed by others, including those not directly involved in the game. It produces outcomes both at the level of individual team members (players) as well as the group (the teams and the league as a whole) we can investigate. This makes soccer no different from basketball teams, work teams, school classes, fraternities, or any other group activity. In fact, if anything, its transparent nature (you can watch the team at work, so to speak) facilitates an analyst's ability to observe, measure, and examine the events and actions of teams and their members. So clearly, we can quantify soccer, and there is no reason to assume we couldn't find or explain regularities or patterns in how humans behave when they play this game.

Does this mean it's easy to quantify soccer? No, it doesn't. Does it mean that we may well be missing important aspects of the game by analyzing match events in the way we currently do? Sure. May there be things about the game that are best examined without statistics? You bet. The nature of the game - it's dynamic, group-based, and actions are interdependent - makes it incredibly challenging to find useful ways of collecting and analyzing relevant information. But this is the fun of it all; it's an interesting analytical challenge. And it doesn't mean we shouldn't try.

But I suspect that these comments are sometimes about something else entirely.

Tuesday, March 15, 2011

The Best Teams in Europe: Barcelona Is Literally Off The Charts, But Can They Stay There?

A quick follow up on how good Barcelona are this year; in a few days, I'll do a more involved post on where the leagues stand at this point in the season, but I thought I'd share this little tidbit for the Barca aficionados out there - and those of you who love to hate on Barca. Whether you're a fan or a hater, you have to be impressed with how good they have been this year. To me, the only team that comes close is Dortmund who are playing some really exciting football.

So here is a graph of offensive and defensive goal to shot (Reep) ratios for all teams in the Big 4 leagues. A higher offensive ratio means that the team is more efficient on offense - the team scored more goals on fewer shots - while a higher defensive ratio means that the team is less efficient on defense - it allowed more goals on fewer shots against. When plotted together, the best teams should be located in the lower right hand quadrant: here, we will find teams that are offensively and defensively efficient (they score more on fewer tries and their opponents have to take more shots to score against them).

Here's where the teams stand on these two dimensions two thirds through the season.

As you can see, Barcelona is off the charts. Its offensive goal to shot ratio is so much better than anyone else's, they have put considerable distance between themselves and the rest of Europe. Mind you, in the lower right hand corner we do see some really good teams like Real Madrid or Dortmund, but Barca are in a league by themselves. The only team that also kind of stands out (in a positive way) is Dortmund, which has the best defensive goal to shot ratio in Europe - that is, they make it very difficult for opponents to score - but their offense can't rival Barcelona's (perhaps because they're playing in a more balanced league?).

So here's the question of the week: can Barcelona stay off the charts? I think there are two good reasons that speak against. First, defense. Barcelona is the best offensive team in Europe, but on defense several top teams seem to be just as good or better. So to continue winning, they have to make sure opponents stay away from their goal, and they won't be able to do that each and every time. Second, there's regression to the mean. Barcelona's schedule is relentless; assuming they have been slightly over-performing (with only one league loss this year), they're due for a disappointment or two and on the offensive metrics they're likely to move a little closer to the pack (that is, to the left on the graph above). It's good news for Barca opponents, and for fans it would guarantee a thrilling remainder of the La Liga and Champions League seasons.

PS: It's notable that a good number of the Italian teams do not perform well on these metrics. Another indicator that Serie A is lagging behind?

Sunday, March 13, 2011

MLS By The Numbers: A Soccermetric Look At the League

With the new MLS season about to get under way, I thought it'd be interesting to take a quick look back at the last season to generate some baseline information for putting the league and teams in a little bit of a soccermetric perspective. Without too much ado, here are some basic stats on the league from last season to put you in the mood for soccer made in the U.S. of A.

First, here are average numbers of shots, shots on target, and goals per match and team. On average, teams took 9.3 shots per match, of which 4.1 were on target, yielding 1.22 goals per team and match or about 2.44 per match total. So that's about one shot on goal every five minutes of match play and one goal for every 37.5 minutes of play.

Which teams led the league in scoring? Were they the teams that ended up making the playoffs, or did defense win points? To see this, I calculated the average number of goals per match for each team individually. Take a look.

As you can see several teams clearly stood out, offensively speaking. Real Salt Lake led the league with 1.46 goals per match, but FC Dallas, Colorado, and the LA Galaxy were not far behind. At the other end of the distribution, DC United cut a truly pitiful figure with .76 goals scored per match. In other words, it took DC United - one of the great teams in MLS history - twice as many matches to score an equal number of goals as Real Salt Lake. Obviously, DC United's season was one of the truly forgettable seasons in MLS history.

What about other common metrics, such as the Reep Ratio (or the number of goals per number of shots taken)? Take a look to see if your favorite team is where you think it should be.

Saturday, March 12, 2011

Beating Barcelona: The Limits of Soccermetrics

Lots of talk in the aftermath of Arsenal's defeat at Camp Nou has centered on the Gunners' inability to generate any meaningful chances. Perhaps Arsena's 0 shots inside Barcelona's 18 are unusual, but guess what? Arsenal aren't alone. Barcelona's performance this year and last has been astonishing, and you don't need to know numbers to understand this; all you need to do is watch Messi & Co in action. It's beautiful and suffocating at the same time.

It's hard to think about how one could use publicly available information to put a number on what it takes to beat Barca. Soccermetrics based on box scores are great for generating insights about events that occur often (when the sample size is relatively large) and about things that everyone can see (like goals, match outcomes, shots, and fouls, for instance). But Barca defeats are such rare events, and the kind of information we have is so limited, that it may just be better to just watch and enjoy rather than to apply soccermetrics based on things like box scores.

But I can't help myself, so here is my Saturday morning back of the envelope analysis of Barca defeats.

Let's start with the basics. How rare are Barca defeats? By the end of February, Barca had lost exactly one match in La Liga - or 4.17% of the matches they've played. During the 2008/09 and 2009/10 seasons combined, they lost a total of 6 league matches, or 7.89% of the time. Face it, Barca losing a match is just not something that happens with enough frequency for us to get a solid handle on. In the one match they lost this season, their opponents were able to get 9 shots off. In the two matches they have tied this year, Barca's opponents managed 7 and 11 shots, exactly 1 and 2 of which, respectively, were actually on target. And in the matches they have won (the vast  majority of times), opponents were able to shoot in the direction of Valdés' goal only an average of 6.5 times, with an accurate shot number of 2.76. Those are astonishing numbers.

You won't be surprised to hear that these numbers are, soccermetrically speaking, literally off the charts. So what makes the difference between the matches they have won and the ones they have lost? Because Barcelona losing is such a rare event, it's important to increase the sample size. So I went back to the prior two seasons to see if there are any patterns in Barcelona wins, draws, and defeats. First I thought that, perhaps, you have to be super aggressive to intimidate them physically. So maybe fouls against Barcelona (or Barcelona exacting retribution with fouls) would be correlated with Barca wins and losses. Take a look.

Clearly, that's not what it is. Barca's opponents typically foul more than Barca do (or are called more for fouls), but there is no difference between matches the team lost, drew, or won.

The one other indicator I thought would be interesting to look at is shots because we can think of them as an indicator of midfield and offensive production - you need to move the ball up the field into position in order to get a shot off. So we can think of it as a rough and ready proxy for a team's ability to create something in the opponent's half. So here's the equivalent graph for shots for and against in Barca matches in the past two seasons.

Now we're getting somewhere. Barca's offense is clearly very consistent across types of match outcomes when it comes to generating shots; generally speaking, they manage around 15+ shots on their opponents' goal. But while their offense has performed as well in the matches they have lost as they have when the team won, they have allowed their opponents to take twice as many shots in the matches they lost. So Messi, Iniesta, Villa, & Co have consistently produced the same offensive display match after match; but they have been more inconsistent with the back to their own goal.

So how would we know whether this really matters, once we control for other factors that go into match outcomes? So here's one final piece of evidence: I estimated a logistic regression where the outcome to be explained was a Barcelona loss. I controlled for fouls, home field advantage, and whether Barca is ahead at the half (it matters a lot, by the way), but also included the number of shots for and against. The only two variables that achieved statistical significance were Barca being ahead at halftime - when they're behind they're much more likely to lose - and shots by their opponents. But this doesn't mean you're highly likely to win if you take 15 shots rather than 5. To get a sense of how hard it is to beat them, the regressions tell us that, based on shots alone (and this is a big condition of course), you'd have to take an otherwordly 45 shots to generate predicted odds of a Barca loss of 50%. If you played at home and were tied at halftime, you'd still have to take 35 shots. And if you played at home and were ahead by one goal at the half, you'd still have to take 26 shots to get to 51% odds of a Barca loss.

Of course, these are just fictitious examples based on very rough and incomplete data, but they should give you a sense that beating Barcelona is a herculean task. If there is one lesson from this little soccermetric exercise, it's that you can't sit back to beat them (unless perhaps you're ahead at the half). Instead, you have to find a way to get the ball into Barcelona's half. I suspect the best way to do that is not to try and pass it there; instead, you may want to give Sam Allardyce a call to see how you could best emulate his old Bolton strategy of playing long ball.

But to really figure it out, you need to watch lots of tape and have the kind of detailed data managers and clubs have access to (the kind provided by Prozone, Opta, and the like), so that you can dissect Barca's weaknesses based on lots of observations of smaller events (passes, runs, dead ball situations, etc.). What soccermetricians can do based on the information available to all is similar to observing lots of patients and telling you what ails the average patient; but for rare events like Barca defeats, it takes a really experienced doctor who's willing to spend lots of time with individual patients to diagnose that rare illness.

In the meantime, forget the numbers and just watch the beautiful Barca game unfold. The one thing I do know is that it's an amazing outlier.

Monday, March 7, 2011

The Point Value of Red Cards: Rare Events That Matter?

Red cards are rare events: the vast majority of matches teams play do not involve any reds at all. Over the course of five seasons between 2005/06 and 2009/10, the percentage of matches without a red card ranged from highs of over 91% in the Premier League and the Bundesliga to a somewhat lower 84.7% in Serie A and 82.4% in La Liga.

These rare events may not sound like they're common enough to matter, but if you do the math, you'll see that they translate into 1 red card per team for every 5 matches played in La Liga or 1 card for every 2.5 matches. In contrast, the frequency of around .8 for the Bundesliga and the EPL translate into 1 red card for every 12.5 matches played by team, or 1 red for slightly more than every 6 matches. Serie A is in-between; its red card numbers come out to around 1 per 6.25 team or 1 in every (slightly more than) 3 matches.

Ok, so far so good, but an important question is this: does it matter? Typically, teams and players try to avoid red cards - it seems intuitive that red cards are costly for teams in different ways; they lose the player (typically a starter) for the match and several more matches after. But does it also impair a team's ability to win points in a match? How costly are red cards really?

If you look at the fair play table for this year's EPL season, for example, you'll see that Arsenal players have seen red 6 times, while Fulham leads the league with 0 reds. So how much can this possibly matter? One way to get a handle on the consequences of actions on the field is to look at their point value for the team. No reason we can't do this for red cards, too. Below is the average point value of red cards - that is, the point value of matches in which teams received one or more reds - for the big leagues over the past five years.

For starters, it may help to know that a team's average point value in the average match across the leagues is 1.37. Avoiding red cards altogether elevates a team's expected point value slightly to about 1.5 across the leagues. Moreover, red cards do turn out to be costly. One red in a match is associated with reducing a team's by more than .5 points. The penalty appears to be heaviest in the Bundesliga, where teams go from an expected value of 1.42 points without a red to .75 with one red. In contrast, in La Liga, the drop in points is from 1.47 to about .97 - still about a half a point.

But, that's not the end of the story.

Friday, March 4, 2011

Home Field Advantage: What You See Depends On Where You Look (And What You Make Of Draws)

A friend recently recommended I read Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won, a new book on numbers and sports (with a focus on decision making) by Tobias Moskowitz and Jon Wertheim. While there isn’t a lot about soccer in the book (it's mostly aimed at the American sports fan of basketball, baseball, football, and to some extent hockey), it’s a fun read. And what there is about soccer is actually really interesting. I recommend it to anyone interested in sports, numbers, and how the people involved in them - players, coaches, and referees - make decisions.

An important part of the book consists of a couple of chapters on the home field advantage in sports, including basketball, hockey, American football, and, you guessed it, soccer. In these chapters, Moskowitz and Wertheim first document that the home field advantage is ubiquitous across different team sports, as well as across leagues and over time. They also argue that most of the advantage home teams seem to have stems from referee decisions, rather than things like travel, schedule, weather, or any other number of factors that have been proposed over the years.

So far so good, but there was one thing about it all that made me want to say "hmm"...

The book has a table that shows the home field advantage to be particularly pronounced in soccer and, according to the authors, across leagues and years. Since we all seem to know about the home team advantage, as I was reading along, I was more interested in the explanation for why it exists than how big it really is. But what stuck with me was the finding that home teams win well over 60% of their matches, a number I initially didn’t pay a whole lot of attention to.

But then I started wondering how this could be true. It seemed to go against everything I remembered about the typical match outcome. So, just to make sure I wasn't hallucinating, I cranked up ye olde dataset from the past five years to crunch a few numbers. For starters, I wanted to know how many matches end in wins and how many end in draws. Turns out, on average, across the leagues for the past five seasons (2005/06 to 2009/10), 26% of all matches end in a draw, which means that 74% of all matches end in a win for one side or the other. So, the odds of having a winner in a match are .74, or basically three quarters of the time. Could it really be that home teams win over 60% of their matches when only 74% end in a win? In other words, do away teams win only slightly more than 10% of the time?

The simple answer is no. Take a look at this distribution of match outcomes.

It's true that more matches end with the home team winning than any other outcome. The numbers vary slightly across leagues, but they're nowhere near 60+%. Instead, around 45-48% of matches end in a home win, and the remainder are roughly evenly split between wins for the away team and draws. So the home team wins about as often as it does not.

So is Scorecasting simply wrong? Well, it depends on how you slice and think about these numbers. Yes, on its face, the finding that home teams win 60% of their matches is incorrect. They win slightly fewer than 50%. But this still begs the question: Do they have an advantage? The answer to that question is also a simple yes. While you wouldn't bet quite as much on them as you might after reading Moskowitz and Wertheim's entertaining book, you should still bet on them if the question is who will win more of the matches that end in a win. Confused? Let me clear it up.

If we forget about draws - that is, throw away about a quarter of all matches - then the magical 60+% re-appears. If you calculate the proportions of matches with a winner that are won by the home side or the away side, you'll see that the home team wins over 60% of those, while the away team wins fewer than 40%.

Does this mean Scorecasting is misleading? I wouldn't go that far, but I would say that it's incomplete. The numbers don't tell the whole story for soccer where, unlike basketball, baseball, or football, we have ties. And ties are important because they generate points. Yes, home teams win more and away teams lose more, but sometimes they come out even. And this means that there is less of a home field advantage in soccer than you might think.

Thursday, March 3, 2011

The Value of Corners: A Quick PS

Here's a quick PS to the post about the goal value of corners. One reader of the blog asked if corners made a difference to match outcomes. It's worth thinking about since corners are correlated with shots (which, in turn, are correlated with points and wins) but they are not correlated with goals (which of course are also correlated with points and wins). So, here are two graphs to satisfy your curiosity about the connection between corners and match outcomes. The first one shows match outcomes (whether a team wins, loses, or draws) as a function of the number of corners a team was awarded in a match (these data are from the Big 4, over the past five seasons).

Clearly, winning teams do not take systematically more corners than teams that lose or manage a draw. Across the leagues, teams get slightly more than 5 corners per match, on average. The only league where there seems to be a little bit of a connection between corners and match outcomes is the Premier League.

So to get a clearer picture of this, I calculated the difference in corners per match between the two teams involved; so a positive number means a team had more corners than the other side, and a negative number means the other team had more corners. Here's a graph for match day differences in corners by wins, ties, and losses in that match. Take a look.

Lo and behold, there is virtually zero connection between taking more corners than the other side and winning, losing, or drawing the match in the Bundesliga, La Liga Primera, or Serie A. But there does seem to be a slight connection in the EPL, where the winning team is awarded half a corner more than the losing team and vice versa. But this seems like a small effect, if you ask me. So all of this is consistent with the story about the goal value of corners: they don't matter to who wins or loses.

Tuesday, March 1, 2011

Importing Footballers: What's Role Does League Quality Play?

I recently wrote a short post about the imports and exports of footballers around Europe, based on a report released by the Professional Football Players Observatory (PFPO), an academic research group. In it, I speculated that there may be a connection between league quality and the percentage of foreign-born players, with the best leagues perhaps attracting the highest percentage of the best players from all around the world.

In the report, the authors claim that there is indeed a link between league quality and player imports. The PFPO measures the quality of the league with the help of a coefficient that "is calculated from the theoretical position of leagues on the regression line obtained by correlating the percentage of matches played in European club competitions by the representatives of national associations and the average personnel expenditure of clubs according to championship.” So it's a combination of a league's' performance in Europe and how much teams spend on personnel. For example, England is thus given a value of 1. 0, Spain 0. 93, Italy 0. 90, Germany 0. 88 and so on, all the way to Estonia (0. 09, ranked 36th).

So here's the plot of the percentage of imported players against the PFPO coefficient from the report.
(c) 2011 PFPO
The data show a positive relationship between league quality and percentage of expatriate players: on average, the better the league, the more expats you'll find. But the graph also seems to show that England, Spain, and France have relatively few expatriate players - these countries have fewer expats than the regression would predict. In contrast, Cyprus and Hungary are outliers on the other side of the regression line, with lots more expatriates than league quality would predict.

So far so good, but I was wondering about a couple of things when I read this.