Belgium 1-1 Egypt: The AI Panel Went 11-for-11 on Belgium and Lost Every One
Eleven frontier AIs all picked Belgium to beat Egypt. The match ended 1-1, handing the entire panel a perfect score in the wrong direction.
When eleven of the world's most advanced AI models look at the same match and reach the exact same conclusion, you expect that conclusion to be safe. At World Cup 2026, Belgium versus Egypt was the cleanest consensus our panel has produced all tournament: unanimous, confident, and completely wrong. The match finished 1-1. Every single model picked Belgium. The scoreboard graded the panel in public, and the verdict was a perfect 0-for-11.
A rare unanimous call
Most matches split the panel. This one did not. All 11 models in the ModelFights field landed on Belgium as the winner, giving Belgium a consensus count of 11 out of 11 head-to-head picks. There was no contrarian voice, no Egypt backer, no model flagging the draw as a live outcome. The disagreement was only in conviction.
Confidence ranged from a cautious 55% up to a bullish 74%. Claude Opus 4.6 was the most tentative at 55%, with Gemini 2.5 Pro just behind at 59%. The middle of the pack clustered tightly: GPT-4o Mini, Claude Opus 4.7, Gemini 2.5 Flash and DeepSeek V3 all sat at 60%, with Claude Sonnet 4.6 at 62% and GPT-5 Mini at 63%. At the aggressive end, Claude Haiku 4.5 went 68%, Gemini 2.5 Flash-Lite 65%, and Grok 4 Fast topped the chart at 74% — the firmest belief in a Belgium win anywhere on the board.
It was, on paper, a textbook favourite-backing exercise. And it is exactly the kind of unanimous read that the public scoreboard exists to test.
What actually happened
Belgium 1, Egypt 1. A draw. Egypt did not roll over against the higher-profile side, the match was level when the whistle blew, and the result landed in the one bucket no model had on its slip. Because the panel was unanimous on Belgium, the draw didn't just cost a few models — it cost all of them at once. There was no hedge anywhere in the field to soften the blow.
That is the brutal arithmetic of a unanimous miss: zero models right, and a head-to-head record of 0 correct from 11 picks for this fixture.
Winner: consensus vs result
| Market | AI Consensus | Confidence Range | Actual Result | Verdict |
|---|---|---|---|---|
| Match Winner | Belgium (11/11) | 55% - 74% | Draw (1-1) | ✗ |
Who got it right, who got it wrong
This is the unusual section where there is no "right" to report. Every model in the field — Claude Opus 4.6, Gemini 2.5 Pro, Claude Opus 4.7, GPT-4o Mini, Claude Haiku 4.5, Gemini 2.5 Flash, DeepSeek V3, GPT-5 Mini, Gemini 2.5 Flash-Lite, Grok 4 Fast and Claude Sonnet 4.6 — picked Belgium, and none of them got the result.
So the honest grading here is about damage control, not accuracy. The model that comes out least bruised is Claude Opus 4.6: it was wrong like everyone else, but at 55% confidence it committed the least conviction to the losing side. Gemini 2.5 Pro (59%) is close behind. Those two read the match as genuinely tight, even if they ultimately leaned the wrong way.
The model that takes the hardest hit is Grok 4 Fast. At 74% confidence on a result that never arrived, it was both the boldest and the most exposed voice on the panel — the clearest example of a model treating a coin-flip fixture as a near-certainty. Claude Haiku 4.5 (68%) and Gemini 2.5 Flash-Lite (65%) sit in the same uncomfortable bracket: high conviction, no return.
There is no sharp standout to celebrate on this one. The lesson is in the spread: the models that priced in more doubt look smartest after the fact, purely for having doubted.
The correct-score angle: a clean sweep of zeros
The exact-score market told the same story even more starkly. Six models submitted scoreline guesses, and not one scored a point. Three went 2-1 Belgium — Gemini 2.5 Flash-Lite, Gemini 2.5 Pro and GPT-5 Mini — and three went 1-0 Belgium — GPT-4o Mini, Claude Opus 4.6 and Claude Opus 4.7. Every prediction assumed a Belgium win and a clean sheet at one end. Reality delivered 1-1, with both teams scoring. The full correct-score haul for this match was zero points across the board.
What is striking is the uniformity. Whether a model leaned toward a narrow 1-0 or a slightly more open 2-1, the underlying assumption was identical: Belgium controls, Egypt is contained, Belgium edges it. None of the six entertained Egypt finding the net. The draw didn't just beat the winner picks — it invalidated the entire shape of the panel's thinking about how the match would flow.
The broader pattern
Belgium versus Egypt is a case study in why unanimous AI consensus is not the same as a safe bet. When every model is trained to weight reputation, ranking and recent form, they tend to converge on the same favourite — and when that favourite drops points, they all drop together. There is no diversity of opinion to cushion the result, and the public scoreboard records it without mercy.
It also reinforces a quieter signal we keep seeing across the tournament: confidence calibration matters as much as direction. The models that hedged toward 55-59% weren't right, but they were honest about the uncertainty in a fixture that turned out to be a genuine toss-up. The ones that pushed toward 70-plus paid for treating a draw-prone match as a formality.
You can see exactly how the panel lined up, confidence by confidence, on the Belgium vs Egypt match page. To see which models are surviving calls like this one and which are quietly bleeding points, check the running AI leaderboard, and browse the rest of our World Cup 2026 slate on the predictions hub. No hindsight edits — Belgium 1-1 Egypt stays on the record exactly as it was called.