An old quip states that “Everybody said something was impossible, until someone came along who didn’t know that, and just did it”.
In the 80’s, in a world of hot dog-eating scouts that trusted uniquely in their own eyes’ judging ability, Beane proposed a strong analytical (today we would use the term ‘data-driven‘) viewpoint in their strategical decisions. This approach took the name of Sabermetrics (from another milestone book called ‘The Bill James Baseball Abstract‘ and its SABR, which stands for ‘Society of American Baseball Research’) and raised many eyebrows between scouts.
Moneyball charted a course for another generation of ‘statheads’.
First in line there was Nate Silver and his PECOTA predictive system of players’ growth. His strong point was the ability to forecast the aging curve which heavily affects a player’s skill level.
Nate Silver frequently mentions one of his most emblematic successful example: Dustin Pedroia. Whereas PECOTA in 2006 ranked Pedroia as the 4th best prospect, the scout magazine ‘Baseball America‘ classified him just 77th.
The result? On November 2007 Dustin Pedroia was named Rookie of the Year and the next season he took no less than the MVP award as the American League’s best all-around performer.
PECOTA proved to see farther than scouts’ ability.
The aging curve is a refined information which derives from a huge amount of data, and is, above all, something that a scout will never be able to perceive, because it requires the ability to synthesize into a pattern, which is far beyond human capabilities with dozens of features.
Let’s make a 7 years jump forward.
Matthew Benham is the owner of the English Championship club Brentford, and he’s also a milionaire that made a fortune betting on football.
In 2013, in a world of beans-and-sausages eating scouts that trusted uniquely in their own eyes’ judging ability, Benham made an astonishing announcement that raised eyebrows so much that they ran the risk of ending under the scalps: he bought a Danish club named Midtjylland and proclaimed that this club would be led uniquely with a statistical approach.
Rasmus Ankersen, chairman at Midtjylland and Benham’s right-hand man (as well as successful writer about analytical methodology applied to football), is sure that a data driven approach is a no man’s land in football, and will essentially turn his club into a laboratory for a radical and fruitful experiment.
My father used to say that there’s a reason if you see a no man’s land in a business: or you’re the first visionary to foresee its potential, or you’re another of the imbeciles that cannot see its obstacles.
And football surely presents its hurdles: a 2010 New York Times article dubbed this game as “the least statistical of all major sports”.
Furthermore, football managers are generally adverse to give up control to the numbers, for a psychological reason: they, basically, don’t trust them. This means that a total harmony of intents is necessary from the head to the tail of the club. But this not a problem for Midtjylland because there the boss IS an analyst.
For example, Ankersen stated that the league table alone is inaccurate of a team’s value because “there’s just way more randomness than people understand”. For this reason, he claimed that the table position will never determine the firing of a coach; their one and only judge will always be their mathematical models.
In such an objective approach to the decisional process, we expect the results to be the only element to tip the balance of power.
So the question is:
Has this numerical approach demonstrated significantly positive outcomes?
Midtjylland won the Danish Superligaen by 8 points over FC Copenhagen, showing by far the best attack (64 goals scored, with the best difference -30 goals- between scored and taken).
They also have scored almost one goal at match from ‘set pieces‘, quite an impressive number which testifies the effectiveness of their ideas.
On the other hand, Nate Silver’s PECOTA system selected a list of players that generated 546 wins for their major league teams through 2011. But the players in ‘Baseball America‘ (as we said, a magazine mainly based on scouts’ ranking) did better, producing a 630 wins. This 15% better performance generated an overall of $336 million in terms of prizes, not exactly a walk in the park.
In conclusion, obtaining effective results is not only a matter of being a visionary or an imbecile (even if it may help). It is the ability to accomplish the three great talents of a good data scientist:
1) To identify the signal from the noise
2) To account for the context in statistic
3) To separate out skill from mere luck