Germany will win and England will bow out to Brazil: Analytics firm predicts every result in the World Cup 2018
“I’m not a great lover of all these stats,” says former Arsenal and England midfielder Paul Merson, with a dismissive wave. That’s unfortunate, because he’s currently sat on stage at an event hosted by Alteryx, a brand promising to give the audience a stats and data-driven insight into who will win the 2018 World Cup.
“We had Opta when I played and when I was at Aston Villa, the data said that [goalkeeper] David James ran more than David Ginola… but Ginola probably played ten world-class passes in that game.”
Merson’s point is clear: stats often don’t tell the whole story. But do they tell enough of one to give fans inside knowledge about who will crash and burn in Moscow?
Self-service analytics firm Alteryx is willing to put its reputation on the line for these predictions, and it’s off to a promising start: “We took the 16-team 2014 round data and applied our model to it – we actually only got two results wrong,” explains Nick Cavey, a solutions engineer at the company. The two missteps would have tripped up most gamblers too: they had Uruguay down to beat Columbia (they lost 2-0) and Brazil to win the third-place play off: a shell-shocked Brazil team, fresh off the back of losing 7-1 to Germany limply lost 3-0 to the Netherlands.
Before I get into Alteryx’s predictions for 2018, it’s worth examining how the Alteryx system works – or at least the bits that the company was willing to reveal. At its core, it’s a Pythagorean expectation model – something which has successfully been applied to basketball and baseball, but not really applied to the football pitch. “We’ve taken this model and tried to apply additional weightings, averages and percentages to the teams,” explains solutions architect Shaan Mistry.
Rather than taking the entire history of the World Cup (that would favour Uruguay, which won in 1930 and 1950), the system is a little more nuanced. We weren’t party to every factor that goes into its decision making, but it does factor in ‘fearlessness’: “Winning a match on penalties is one thing, winning a match from coming from behind is a whole different thing altogether,” says Mistry. Essentially, there is a high percentage rating for teams with the guts to progress in difficult circumstances.
So what does the CavMist (Cavey/Mistry) model predict when applied to the whole tournament?
- Group A: Uruguay (9 points); Russia (6 points)
- Group B: Spain (9 points); Portugal (6 points)
- Group C: France (9 points); Peru (4 points)
- Group D: Argentina (9 points); Croatia (6 points)
- Group E: Brazil (9 points); Switzerland (6 points)
- Group F: Germany (9 points); Mexico (6 points)
- Group G: England (7 points); Belgium (7 points)
- Group H: Colombia (9 points); Senegal (6 points)
No Croatia, no South Korea and no Poland. You read it here first.
From there, this is how the CavMist system sees the rest of the tournament going:
There were a few close calls in this mix. “We’re edging Belgium by a marginal percentage over Columbia,” says Cavey. “With Spain vs Argentina, I’m not going to say we tossed a coin, but based on the analysis of the squads and weightings and how they’ve gone in recent years, we’re edging Argentina. But it might go to penalties.”
There are then three teams head and shoulders above the rest. “When we first did the stats, Argentina, Germany and Brazil were heavily weighted,” explains Cavey. “We actually think Brazil are going to breeze through France,” he adds – tactfully omitting the fact they would have breezed through England in the process.
And as for an overall winner: it’s Germany. 100% Germany. “We ran it through a few times, changed a few weightings and whatever way we spin it, we still come out with Germany.”
This is where the former pro Merson bumps up against the data hounds: “I don’t expect Germany to get to the final. I don’t think they’re that good,” he says. “I think we could beat Germany – they’re not the team they were.”
For him, it’s all about Brazil: “I think Brazil will walk it. I think they’re head and shoulders above all the other teams.” For what it’s worth, the data heads at Opta agree with Merson. This is how they see each team’s chances of lifting the trophy, and as you can see Merson and Opta are in sync on this one:
We shall see who’s lifting the trophy on 15 July.
What about England?
The model has England going out in the quarter finals: a series of words which just seem to go together somehow. So where are England going wrong this time? That depends on who you ask.
For Merson, the midfield selection looks weak: “That scares me, that midfield. There’s not a pass in there,” he laments. “There’s not a player in there that can put a ball through the eye of the needle,” he adds, name checking Wiltshire, Lallana and Shelvy as ones that got away. “We ain’t going to pass anyone to death in this competition.”
This is backed up by the data: In the top ten passers of all 736 players at the World Cup, there is not a single England shirt (though Dele Ali and Kyle Walker sneak in at 12th and 15th respectively.)
Cavey and Mistry present another slightly damning graph. This is the top ten goalscorers in the last two years for Argentina and England. Not only have England scored considerably fewer, but two defenders make the list: Maguire and Cahill. In other words, we’re extremely reliant on Harry Kane, and as the 2010 World Cup showed, relying on one striker for the goals seldom ends well. “Sterling hasn’t scored since 2015 in international football – I think it’s 18 caps,” chips in Merson. “I’m 50, I should be able to score in that time.”
So, if we’re not going to see an England player as top scorer, what do the stats say? “We did have a simulator for the Golden Boot, and it largely depends on how far the teams go,” says Cavey. “It’s likely to be a team that at least gets to the semi final. My money would be on Griezmann, because I think France are going to win their matches quite heavily before they tank out to Brazil.”
Merson offers a similar take. “I would go Neymar, but I’ll go Sterling on the off-chance he hits a streak. I know he can’t hit a barn door at the moment, but you never know he might just click, like Schillaci and Roger Miller [in 1990].”
Alright on the night?
Of course, stats can only tell you about past form and predictions. Nothing can prepare you for the emotions and luck on the day. To best demonstrate that, Merson talks about his experience at the 1998 World Cup quarter-final against Argentina. Beckham has been sent off, and it’s going to penalties. “We never practiced penalties, and Hoddle went ‘Incey, Batty, Shearer, Merse, Michael – penalties,'” he recalls.
“Standing on the halfway line preparing for that penalty against Argentina at the World Cup, and all that’s going through my head at the time is three months before this I had a penalty at Sheffield United on a Tuesday night, and as I stepped up, my confidence dropped.” The goal seemed to shrink and he knew he was going to miss – sure enough, he did, smacking it “40 rows over the bar,” and losing the game.
“If I can’t score at Sheffield United on a Tuesday night what chance have I got here?,” he recalls thinking. Looking visibly distressed, Hoddle made a beeline to him, providing a much-needed intervention. “He put his hand on my chest. He looked me in the eye and said ‘you will not miss’ – and I thought ‘thank fuck for that!’.
“This is where the penalties are taken: the walk from the halfway line is where the penalty is scored or missed. I had the best feeling in the world, I knew I was going to score.”
There’s a point to this: the data cannot possibly see what’s going on in a player’s mind as the circumstances change – even if it ultimately is reflected in the stats that inform future predictions. A good man manager can make or break the moment, as Hoddle did in 1998 when Merson’s penalty hit the back of the net.
Or as Merson puts it: “I look back at my career and think ‘if Ince and Batty were as scatty as me, we could have won the World Cup.'”