Sunday, March 20, 2011

Destroying the Soul of Soccer, One Statistic At a Time ...

I've gotten lots and many different kinds of reactions to my guest post on the New York Times Goal soccer blog. So I thought I'd say a few things about the issues raised by people who care enough to comment.

First of all, thanks to everyone for reading and going to the trouble to write in, either on the Times comments section or to me personally. I don't agree with everything said (surprise!) - hence this post - but I'm glad people are willing to engage.

In general, there seem to be four groups of issues (and I'm paraphrasing a bit here). I would categorize readers' reactions as follows, from most negative to most positive:

1. You can't quantify soccer.

Though it's an easy criticism to lodge, it's a complex issue. It's also the one I disagree with the most. It basically says that soccer analytics is pointless. To exaggerate (perhaps only a little), it's saying that understanding soccer is a little bit like having a religion; you have to be dipped in the holy water to understand it, or you have to have been raised in "insert European or Latin American country here"; otherwise you just don't "get it."

I disagree, and here's why: there's nothing inherent in soccer as an organized activity that makes it immune to systematic observation and analysis. Soccer is fundamentally a group activity - no less, but also no more. It takes place during a measurable period of time, participants' actions are governed by a set of rules, and it generates human activity that can be observed by others, including those not directly involved in the game. It produces outcomes both at the level of individual team members (players) as well as the group (the teams and the league as a whole) we can investigate. This makes soccer no different from basketball teams, work teams, school classes, fraternities, or any other group activity. In fact, if anything, its transparent nature (you can watch the team at work, so to speak) facilitates an analyst's ability to observe, measure, and examine the events and actions of teams and their members. So clearly, we can quantify soccer, and there is no reason to assume we couldn't find or explain regularities or patterns in how humans behave when they play this game.

Does this mean it's easy to quantify soccer? No, it doesn't. Does it mean that we may well be missing important aspects of the game by analyzing match events in the way we currently do? Sure. May there be things about the game that are best examined without statistics? You bet. The nature of the game - it's dynamic, group-based, and actions are interdependent - makes it incredibly challenging to find useful ways of collecting and analyzing relevant information. But this is the fun of it all; it's an interesting analytical challenge. And it doesn't mean we shouldn't try.

But I suspect that these comments are sometimes about something else entirely.



The notion that you can't quantify the "beautiful game" reveals a deeper discomfort with soccer analytics. If you think about it, an analytical approach to understanding the game assumes that there is something that we can glean from the data - or that there is something that is potentially knowable about the game - that long-standing observers don't already know.

Why would this be threatening to some? Well, if being an expert on soccer is about being born in the "right" country and having watched and played the game from a young age, then one becomes a legitimate purveyor of insights about the game because of who you are; it also means that achieving expert knowledge is difficult for those who do not share this background. So if we can learn about soccer by analyzing it with data, then anyone can become an expert, provided they spend enough time with the data and doing it the right way. In the process, it makes those dipped in the holy water less, well, powerful because their insights become less special and more open to question and analysis. After all, data can be used to prove experts wrong.

By the way, this is exactly the story of Michael Lewis' Moneyball (the book about baseball analytics that has had such a profound influence on sabermetrics and the practice of baseball management). The story of Billy Beane and the Oakland A's is about the struggle between those who thought they just "knew" the game because they had lived it - the traditional scouts and coaches - and those who used evidence to analyze the game. On a smaller scale, the same dispute is reflected in the idea that just can't quantify the beautiful game.

Obviously, I think you can. By the way, so do many of the biggest clubs in the world - they have invested serious money in building their soccer analytics capacity. But no, folks, none of this means I'm out to destroy the soul of football one statistic at a time. I'm only out to see what I can learn about the game that I didn't already know or to find out if what I thought I knew was reflected in the data. Evidence helps with that sort of thing. Of course, whether all this analysis generates interesting insights is a different question altogether, and that's where we get to typical comment no.2.



2. Your analysis is ok, but kind of lame, and your data may be bad.

It may surprise you to hear that I tend to agree with the sentiment behind these kinds of comments, at least in principle. I take these them to mean that readers have two big questions about soccer analytics: first, the assumptions that go into the analyses; and second, whether the data I use are any good (or good enough).

These are completely fair criticisms because they are fundamental issues in any kind of soccer analytics. Your analyses can't go very far without making some assumptions about how soccer works or what you may expect to find. Your ideas about the game influence what you want the data to speak to. So if you're clueless about soccer or are only looking for the obvious, then you won't find anything interesting or only confirm what everyone already (thought they) knew (for example, that taking shots is correlated with scoring goals). Now, confirming something everyone assumed all along is better than to just go on assuming things, but it's not the same as generating unique insights that change how we look at the game. But you have to start somewhere. And every once in a while you'll stumble across a nugget of something interesting that'll make people say: "hey, I didn't know that."

As to the question of data quality, it's hugely important, too. As we like to say in the social sciences (or any of the sciences, really): garbage in, garbage out. That is, your conclusions are only as good as the data you use. If the data are biased (in this particular instance, if the data generating process has biases by, e.g., systematically over- or under-counting shots in one of the leagues), then the conclusions you draw from the data are suspect (or biased). If you know the source of the bias, you can correct for that; if you don't, your conclusions are flawed. So data reliability issues are big in soccermetrics because we don't all work with the same kind of data or sources. Fair enough.

(In case you're interested, fellow soccermetricians Onfooty.com and soccermetrics.com have a couple of nice pieces discussing these kinds of issues.)

Assuming that you have a decent sense of the game and that the data are not a complete mess, the next set of comments has to do with pushing analysts to be more creative and insightful.


3. Have you thought about doing X ...?

is probably the most typical comment I get, not just in response to the NYT piece, but ever since I started this blog. I really like these kinds of reactions because they can provide new ideas or push me to try a little harder. Several times over the past few months, readers have made astute suggestions for analyses, some of which I have been able to do (you know who you are, so thanks!). Unfortunately, not every kind of analysis is feasible (and not every kind makes sense), but please keep 'em coming. Otherwise, it's just me spending way too much time thinking about soccer all by myself. But I appreciate that these kinds of comments reflect an openness to thinking about the game more systematically and a willingness to entertain different conclusions about how the game works.

Then finally, there are the following kinds of comments.


4. I love it. Keep it up. This helps me understand soccer.

Naturally, this is my favorite reaction because it's quintessentially open minded (and I feel flattered, of course; so thanks!). These readers don't make assumptions about where I'm from or what I know. It's just looking at the facts and enjoying them for what they are or may be able to tell us about the game. Openness about new approaches to understanding the game may make it easier for us to find new ways of seeing the game, and perhaps playing it. It may ultimately fail as an enterprise, but it's been a lot of fun to try and see how far we can push this. All we really need at this point is a curious audience and a few Billy Beanes to let soccer analysts discover the potential magic in our spreadsheets.

One last thing: you can love the game for all the beauty it creates - the unstudied spontaneity, the brilliance of a bicycle kick, or the choreography of zonal defense - and at the same time believe there is beauty in the numbers. But only when they're done right. To paraphrase the late Daniel Patrick Moynihan, everyone is entitled to their own opinion about the beautiful game, but not their own facts.