Saturday, April 27, 2013

Avoiding the Drop: To Score Or Not To Be Scored On?

It's almost that time of year again. Clubs at the wrong end of the table are starting to fret; supporters are getting anxious and take to Twitter. Managers' palms - well, of those managers who haven't been sacked - are sweating a little more ... The specter of the drop starts to loom, as the freshness of the early season is but a distant memory, long ago having given way to the reality of not having accumulated enough points.

We have written about relegation before, so this is mostly a quick follow up from our data vault. What seems to matter more, scoring goals or not conceding them? If you're running a football club, where should you have put your money in January, offense or defense?


At first glance, the numbers seem to tell us that they both matter, as the data of goals (scored and conceded) and points in the graph above show. You can win more points by scoring and you'll win more points by not conceding. But curiously, this doesn't necessarily mean they're equally valuable strategies for avoiding relegation.

When we compare the offensive and defensive production of clubs that were relegated with those that weren't, a slightly more interesting picture emerges.


The data show that, over the sixteen seasons between 1995/96 and 2010/11, teams that stayed in the Premier League scored about as many goals as they conceded - 52 v. 47, respectively. In stark contrast, relegated clubs conceded many more goals than they scored - 37 v. 66. You may think this is obvious; and it is. Poor performance means not scoring and being scored on.

But the key is that there is an interesting (and maybe an important) asymmetry: the 37 goals that relegated teams scored, on average, is 71% of the 52 goals non-relegated teams scored - a difference of 29%. But the 66 goals relegated teams conceded are 140% of the 47 non-relegated teams let in - a difference of 40%.

And even if we compare the clubs ranked 18-20 with those ranked 15-17, we see a difference of 10% on offense and 14% on defense.

The numbers are pretty straightforward; on average, clubs that stay in the league score about as much as they concede. Clubs that go down lose the battle against relegation in front of their own goal; defense is their (statistical) Achilles heel.


So when in doubt and seriously worried about the prospect of relegation: spend your money, time, and effort on upgrading your defense. Your supporters may not appreciate the kind of football that produces in the short run, but they'll be happy when they get to see Chelsea, the Manchester clubs, and Arsenal next year rather than Huddersfield, Peterborough, or Burnley.

Friday, April 5, 2013

The Politics of Analytics: Raptors, Dinosaurs, and Matt Busby

Whether football likes it or not, analytics has been creeping into every aspect of a club’s operations – from scouting to medical to match day analysis. And yet, it is fair to say that analytic approaches have not been welcomed with open arms. Aside from scouts who have greeted the analytics movement with cold-shouldered trepidation, some of the greatest resistance to analytics has come from members of the coaching profession. So it was refreshing to hear Bill Gerrard, one of the leading practitioners of the dark arts of analytics in football (rugby and association), talk about his work with the coaching staff at Saracens F.C., one of the leading teams in English rugby union.

Speaking at the Sports Analytics Innovation Summit in London, Gerrard explained how he, along with the Saracens staff, have developed a system he terms “evidence-based coaching” to assist in the team’s evaluation of the coaches’ game plan, and to use data collected by and with the help of coaches on how well the players have performed relative to the game plan. This information, after careful vetting and analysis, can then also be fed back to players to see how they are performing relative to the coaches’ expectations.

Aside from the obvious and perhaps not so obvious disadvantages of the system (I’ve described some of them here), the tale Gerrard was telling was both refreshing and predictable. Refreshing, because Gerrard explained how well the team of coaches and analysts were working together using both qualitative and quantitative data, and in the spirit of improving the team. Predictable, because his tale highlighted some of the key obstacles to analytics in modern-day football clubs.

Don’t get me wrong; evidence-based coaching is a good idea. Using more information is better than using less; having quantitative and qualitative measures to gauge performance (along with our old friends gut and instinct) helps coaches and players figure out what really happened; and so on. But evidence-based coaching limits analytics because it leaves coaches in charge.

When it comes to analytics, coaches are some of the toughest nuts to crack.

The politics of analytics get in the way of turning Gerrard and others like him, into more than mere data assistants. By inclination and training, coaches are much more focused on what they can see with their own eyes – games, videos, etc. – and prefer to rely on their own experiences and instinct, rather than what hard data can tell them. They are like the ultimate old-line scouts in Moneyball. Moreover, they like to have and exercise control.


Mind you, football or rugby are far from unusual. In fact, they're a lot like baseball or basketball.

Wednesday, April 3, 2013

Pearls Of Wisdom Or Perils Of Wisdom? The Problem With Evidence Based Coaching

I recently spent a couple of days at the Sports Analytics Innovation Summit in London. Held at the beautiful Oval cricket ground, it brought together analysts working in a variety of sports, ranging from Olympic sports to cricket and rugby to football. It was a stimulating two days, with presentations by people like Manchester City’s Simon Wilson, Chelsea’s Ben Smith, and Manchester United’s Tony Strudwick. The one talk that still sticks in my mind, but perhaps for all the wrong reasons, was by Bill Gerrard, a professor at the University of Leeds. The topic was “Beyond Moneyball: Evidence-Based Coaching Not Just Statistics-Led Recruitment.”

If you’ve spent any time reading the academic literature on football, you’ll know that Gerrard is a highly respected sports economist. For years, he has worked on analyzing the economics of football; more recently, he also has been working with Saracens F.C., one of the leading teams in English rugby union, on what he calls “evidence based coaching.” The idea is simple, as Gerrard helpfully explains in this short essay on the topic:
Evidence-based coaching at Saracens involves a five-stage process. It starts with the coaching vision of the ‘perfect game’, a tactical game plan covering the different aspects of play – attack, defence and set-plays. After every game each coach analyses his area of responsibility and evaluates how well the players have performed relative to the game plan. Particular emphasis is put on evaluating the decisions made by players and their skills implementation. The next stage is for the data generated by the coaches to be analysed. This leads on to monitoring team and player KPIs using a dashboard with a traffic-lights system to classify performance levels. These KPIs are not only tracked and interpreted by the coaches but are also made available to the players who can access these KPIs after every game on their iPads and laptops.
As Gerrard pointed out at the conference, Saracens prefer to use these “small data” rather than “big data” – the kinds of large datasets produced by tracking systems or providers like Opta. It’s not that they don’t use big data, but the key is to focus on “the systematic recording of the coaches’ expert evaluations of player performance. Systematically recording these evaluations creates the data that can then be analysed. Importantly it is only the Saracens coaches who can determine how effectively the players are implementing the Saracens game plan. Third-party commercial data providers can only provide activity data of how much players have done. Only the team coaches are in a position to produce effectiveness data.”

Gerrard is a modest man who is realistic about his own role. As these passages make clear and as he pointed out at the conference, the coaches drive the analysis process. Or as he put it, the coaches give him their “pearls of wisdom” and his job is to use his analysis skills to support them in their work. As he further explained in the essay:
My contribution as the statistical analyst is to search for patterns in the data, transforming the data into evidence to help inform coaching decisions on player recruitment, tactics, team selection, match preparation and training priorities. At Saracens statistical analysis provides additional oil in an already well-oiled evidence-based engine.
This all sounds great and well thought out: a well-oiled machine of coaches and analysts working together; cooperation and collaboration between analyst and coaching staff; an openness among the coaching staff to analytic insights; and a team-oriented organization that puts the club’s outcomes front and center.

Using data in this way has obvious advantages. It puts coaches at the center of the analysis enterprise. In an industry where coaches bear primary responsibility for outcomes – and are held accountable for wins and losses – giving them more tools to do their job (an analyst who can assist them) seems like an excellent idea. It also focuses analysis on winning games – never a bad thing, if you ask me – since “effectiveness” is defined in relation to the coaches’ game plan.

To me, this approach is laudable, and yet, it worries me because it takes analysis only part way toward the goal of helping clubs win. So where exactly is the problem?

Tuesday, March 19, 2013

What Exactly Is Football Analytics? What is Football Analytics Not?

There was a moment during the MIT Sloan Sports Analytics Conference a few weeks ago when I leaned over to my partner David Sally and asked: “So what exactly is analytics, then?” I was confused, as perhaps a number of attendees were. One of the participants on one of the American sports panels implied – and I’m paraphrasing – that pretty much anything is analytics; he seemed to be saying that, if you’re a reasonable person who considers different opinions and evidence to make decisions, you’re using “analytics.”

This definition of “analytics” as “reasonable decision making” seemed just a step too far to be useful. You mean, if you use your brain to decide on a trade for a player, for instance, you’re doing analytics? That seems to be stretching things to the point of being useless.

But this moment highlighted a more general problem when discussing analytics in football – what exactly do we mean by it? It’s an important question. If we’re not using the same language or agreeing on the definition of what it means to do football analytics - if we're not talking about the same thing - then it’s hard to have a conversation about where it is and where it’s going.

So here goes, just for clarification: when we speak of it, we really mean the basic Wikipedia definition of “the discovery and communication of meaningful patterns in data.” Of course, this immediately begs the question of “what is data?”

And that’s where things sometimes seem to break down in the chain of communication.

For many, data have to be numerical – things you can put in a spreadsheet or a table. So football analytics is about developing statistical models based on fairly complex data matrices. And because the game is complex and therefore leads us to develop complex models to capture its essence, football analytics is complex.

For others, "data" simply seems to mean “information, systematically collected.” This could be a series of video clips or sketches of formations or a list of the languages a player speaks or which scouts watched him play – you name it – but always things that are verifiable and that consist of “meaningful patterns.” This kind of information can be quantitative, but it doesn’t have to be. Defining analytics in this way means being systematic in your work and the information you use.

Understanding that both types of data and approaches can be used to do analytics is important. After all, no matter how you define data, one thing we all share in common is the goal to discover patterns, communicate them, and use them to improve some aspect of decision making and ultimately team performance, either through efficiency or innovation. So, no matter our data, we can speak of scouting analytics or match analytics or injury analytics (the list is potentially quite long); and we are looking for patterns in all kinds of football data that are discovered in order to find better players for less money or produce wins or manage players’ careers or judge a manager’s tenure … you get the point.

So back to the panelist I mentioned at the outset. If what he's doing is making considered decisions, he's simply doing his job. It’s called being a professional. But it’s not analytics.

At the same time, if all you have is lots of data, no matter how extensive and expensive and sophisticated, then you're not involved in analytics either. You need to do something with them using a transparent methodology and with a particular goal in mind. It’s about the hard work of “discovery” and “communication” of “meaningful” patterns. Otherwise it’s not analytics – then it's just data hoarding.

Friday, March 15, 2013

The MIT Sloan Sports Analytics Conference: The Best Revival Meeting Around

A couple of weeks ago, I was a panelist at the MIT Sloan Sports Analytics Conference in Boston. Described by the high priest of sports journalism Bill Simmons as “Dorkapalooza”, the conference has the feel of a revival meeting that conveniently offers a job placement service and an introductory stats lecture to boot.

Don’t get me wrong – I absolutely love the conference and wouldn’t miss it for the world. The panelists are first rate (obviously), I learn something and meet someone new every year, and there is real potential for cross-pollination across the different sports.

The buzz and excitement is palpable, and the optimism infectious. In fact, this is one of the most impressive things about this gathering of the faithful: it reminds participants that there is a community of like minded spirits who are out there spreading the gospel of sports analytics. That’s powerful stuff and very important – necessary in fact for advancing the field and the quality of the work people do.

To me, key to appreciating the importance of a meeting like the Sloan conference that brings people together and re-energizes them around analytics is to appreciate the contrast with the day-to-day reality of football analytics “on the ground.” When traversing the hallways, listening to discussions, or looking out at the ever growing crowd of soccer analytics enthusiasts, I couldn’t help but think of a conversation I had with the chief executive of an English football club a few months ago (and I've had similar ones a good number of times). Sitting in the board room surrounded by images and mementos of past glories, I asked about club's use of the likes of Prozone, Opta, scouting software – you name it. “You know, of course we subscribe to them,” he confided. “The guys like to have those toys so they can play with them and say they have them, but the manager doesn’t really believe in it. So he’s not using it. So yeah, we have it, but …” It was clear he thought of these tools as an expense that was hard to justify, except for the fact that it kept the technical staff happy.

The chasm between that reality and the belief of the faithful at an event like the Sloan conference is important to recognize. If Michael Lewis wrote the gospel of sports analytics in Moneyball, and Billy Beane is its patron saint, then there are plenty of frustrated apostles out there working hard at convincing the skeptics. This disconnect between the faith of football analytics community on full display in Boston and daily practice is real. The people filling the seats at the Boston Convention Center trust that analytics is valuable and possibly essential for producing wins. So far, so good.

With the exception of seasoned pros working in clubs or for data companies, they also seem to think that analytics are commonly and routinely practiced without too much argument across the board at the top levels of professional football. As we tried to explain on the panel, this is not the case. Yes, analytics – in its various guises from match analysis to scouting and recruitment or medical science – is growing, but it is hardly unquestioned and often considered a distraction to “normal” football operations.

This means that the Church of Analytics still has relatively few true converts at the highest levels of professional football. There are many reasons for this, and we discussed some of them at the conference, but to make progress on the analytics front requires facing that fact head on, rather than to pretend it doesn’t exist. Football analytics still faces a steep uphill climb – one that we certainly and happily embrace – but like any innovative technology, it is a climb to successful adoption.

In many ways, football’s story is a sequel to the Oakland A’s story. But football hasn’t yet had its Moneyball moment, and whether it will is still an open question – basketball or American football or ice hockey haven’t either, if you look closely. As anyone familiar with North American sports and European football will appreciate, whatever barriers there are in say, basketball are nothing compared to the walls that loom in football. As ESPN’s Albert Larcada pointed out on the Sloan panel, “Even when we do advance we’re still going to be playing catch up.” Tradition is a potent impediment to anyone trying to introduce new ideas to clubs, trying to encourage their employers to play the numbers game.

So having faith in analytics is great; converting nonbelievers quite another thing. That requires a combination of high quality work and a plan. And that's where a lot of the heavy lifting will have to be done in years to come. The good news is that the revival meetings keep coming and they are growing. The last decade has seen sports analytics grow into an industry. Attendance at the Sloan Sports Analytics Conference has grown tenfold over a five year period from around 200 in 2007 to well over 2,000 in 2012; that’s a whopping compound annual growth rate of 58.5% in case you were curious. Now that’s something to be hopeful about!

Thursday, December 6, 2012

Age and the Inverted U: Player Age and Transfer Market Valuations


Timeless beauty?
Some things, like a good wine, get better with age. Footballers do, too, but unfortunately only up to a point. Even in the era of Giggs, Scholes, and Friedel, no matter where athletes do their work, they will eventually see a decline in performance. If that's true, and if we assume that a player's performance is linked to his market value, both should increase during the first half of a player's career and decline thereafter.

But is that really the case? And if it is, where exactly is that turning point – the time at which players become more and less valuable? One way to answer that question is examine their performance; another way is to see what the market says. It's the latter we focus on here. We wanted to know: what is the connection between a player’s age and the price he can command?

To answer these questions, we collected data from the respected Transfermarkt website for all players currently on Premier League squads and performed a variety of calculations on their transfer values (complete data were available for a total of 502 players; we collected these data in early October).*

As in our previous analyses, we estimated a set of of regression models with the aim of accurately predicting a player’s valuation based on a variety of factors, including things like position, nationality, club, contract length, experience in the league, and so on. But critically, we also included two variables to assess the influence of a player’s age on his valuation – the actual age, to capture any linear trend in age and age squared, to see if the connection between age and valuation is curvilinear.

Friday, November 30, 2012

The Myth of the English Premium

Every time an Andy Carroll or Alan Shearer get sold, Liverpool (over-)pay for British-born players, or Arsene Wenger (and lately, Alan Pardew) go shopping for undervalued talent in France, the idea of an English (or sometimes, British) premium is bandied about.

But is it really true that you have to pay a premium for English players? The underlying idea here is that there is positive discrimination in the English player market, with selling clubs charging a little (or a lot) extra for English or British players.

To find out, first, let's look at how the transfer market currently values English v. non-English players (or players fortunate to hail from the British Isles or not).

Using data from the respected Transfermarkt website on all players currently on Premier League squads, we performed a variety of calculations on their transfer values (complete data were available for a total of 502 players; we collected these data in October).

One thorny problem, of course, is who counts as "English" or "British" - this turns out to be slightly less than obvious. Sure, Ben Foster is English, but there are a number of players whose ancestry or personal history is more than a bit muddled. Some of it owes to the vagaries of modern migration; some of it has to do with which national side someone chooses (or hopes) to play for. So, this is a long way of saying: we did our best to determine a player's nationality, but we probably made a few calls that are debatable. That's our first indication, though, that determining the English premium is less than completely straightforward.

Keeping those caveats in mind, the numbers show that the market appears to value English and British (which includes Scottish, Irish, and Welsh) players less than the average.

Recall from our previous analysis that the average Premier League player this fall is valued at £5.94 million. In contrast, English players are valued almost exactly one million pounds less (£4.96) and all British players combined about £1.5 mio. pounds less. (£4.51). In fact, given that about half of all players in the league qualify as British in some way, the average of £5.94 million is brought down by the relatively low valuations of native footballers. The average for non-Brits, in fact, stands at £7.39 million.

On its face, this suggests that English/British players can be had at a bargain, rather than a premium.


But seeing differences in averages doesn't mean that English or British players necessarily command a premium - by definition, "premium" implies that a club needs to pay more to obtain an English player than they would for an ordinary player; it's a kind of surcharge for the same player they would otherwise buy.

Wednesday, November 28, 2012

Mapping the Market: How Much Do Players in the Premier League Cost?

If you wanted to buy yourself a Premier League footballer, how much would you have to shell out?  As we get closer to the January transfer window, we thought it would be interesting to take a fresh look at how much players go for these days.

To find out, we collected data from the respected Transfermarkt website on all players currently on Premier League squads and performed a variety of calculations on their transfer values (complete data were available for a total of 502 players; we collected these data in October).

The average Premier League player is currently valued at £5.94 million - a tidy sum to be sure, and of course higher than it's ever been, but not stratospheric. The interesting thing, too, is that the values of players aren't normally distributed - in bell curve fashion, where most (average) players would be expected to be located somewhere in the middle, and then a few on the low and the high ends, respectively. Instead, the transfer values of players show a remarkably skew toward the lower end. Take a look.

Here's the overall distribution (we've grouped players in bands to make the graph more intelligible).


Clearly, the majority of players are valued at significantly less than the average. 25.9% are on the market for £1.3 mio. or less, 51.6% are valued at £3.1 mio. or less, and a whopping 70.5% are valued at £5.7 mio. or less. This makes sense - many of the players in the dataset (in fact, by definition about half, in a squad of 25, assuming some injuries over the season) are not regular starters for their clubs. And of course there are a few very special talents who can command much, much more, thus bringing up the average.

Finally, we wanted to see how much the market differentiates by position. Here we see that the market values different positions differently - something we've long known, but seldom have put a real number on. Players' values increase significantly as we move up the pitch, with forwards valued highest at an average of £7.6 mio., and net minders at less than half that (£3.3 mio.) The jumps in value from keeper to defender is almost £2 million, midfielders are about £1 mio. more than defenders, and strikers about £1.4 more valuable than midfielders.


The market appears to value scoring goals more than it values preventing them.

Of course, these broad averages obscure a lot of differences across individual players and clubs - something we will be writing about in the coming weeks - but they map out the domestic English market as it stands today. 

Wednesday, November 14, 2012

Back To Basics: Who Touches The Ball?

The ball is round, Sepp Herberger, the legendary, World Cup-winning coach of the 1954 West German side used to say. Moving the leather around the pitch is what football is all about. While that's always been true, how teams have gone about maneuvering the ball into their opponent's net and away from their own has changed considerably since Herberger's days, however.

In this day and age, it is hard to imagine teams playing a 2-3-5, the most common formation in the early days of the game. Instead, as Jonathan Wilson has described so beautifully in his magisterial history of football tactics, Inverting The Pyramid, football has evolved to become a game focused mostly on offense to a game focused on balancing offensive and defensive needs. Today, with the rise of the false 9, Barcelona and Spain are even occasionally playing without a striker, period.

So who touches the ball in the modern game? Is the tactical focus on a balance between offense and defense reflected in the match data?

To see if it is, we calculated which positions actually see more and which see less of the ball with the help of Opta Sports match data for a recent season of Premier League play (2010/11). Our original intuition was that the balance between offense and defense should mean that midfielders would touch the ball the most.

Here is, first, the average number of ball contacts by position and match. Touches on the ball can be anything here - passes, flick-ons, headers, shots on goal, you name it.


These numbers suggest that the average defender touched the ball significantly more than the average forward, with midfielders somewhere in the, well, middle. But what's perhaps most noticeable is that the average 55 ball contacts defenders make tower over the average 30 touches forwards get. Thus, according to these calculations, midfielders and defenders touch the back many more times than the guys tasked with putting the ball in the other side's net.

Could that be right? Has the pyramid really been completely inverted, with defenders dominating today's game?