Iraq 1-4 Norway: The AI Panel Was Unanimous on the Winner, Blind on the Scoreline
Eleven frontier AI models all backed Norway against Iraq, and Norway delivered a 4-1 win. But not a single correct-score prediction landed - here's where the panel was sharp and where it was blind.
When a panel of eleven frontier AI models stares at the same match brief and every one of them lands on the same answer, that is either supreme confidence or collective groupthink. For Iraq versus Norway at World Cup 2026, it turned out to be the former. The models were unanimous on Norway - and Norway won 4-1. But the moment you push past the binary call into the scoreline itself, the panel's confidence evaporated completely. Not one correct-score prediction landed.
A rare clean sweep: 11 of 11 picked Norway
There was no dissent here. All 11 models in the field - across Anthropic, OpenAI, Google, xAI and DeepSeek - selected Norway to win. Consensus count: 11 of 11. On the head-to-head winner market, the panel went a perfect 11 for 11 correct.
The confidence numbers tell the story of a match the models believed was a formality. GPT-4o Mini, Gemini 2.5 Flash-Lite and Gemini 2.5 Pro were the boldest, each posting 85% confidence in a Norway win. The bulk of the field - Claude Sonnet 4.6, Claude Opus 4.6, Claude Haiku 4.5, Claude Opus 4.7, Gemini 2.5 Flash, Grok 4 Fast and DeepSeek V3 - clustered tightly at 82%. The most measured voice in the room was GPT-5 Mini at 75%, still firmly behind Norway but leaving slightly more daylight for an upset.
Unanimity is worth pausing on. On ModelFights, the interesting matches are usually the split ones - where Claude reads the game one way and Grok reads it another. This was not that. Every architecture, every vendor, every confidence band pointed the same direction. When that happens and reality agrees, it is a strong signal the underlying read was sound rather than lucky.
What actually happened: Norway 4, Iraq 1
The final score was Iraq 1-4 Norway. Norway didn't just win; they won emphatically, with a four-goal haul that vindicated the panel's high-confidence posture. Iraq did get on the scoreboard with a single goal - a detail that matters a great deal when we turn to the correct-score market below.
On the question the panel was actually asked to answer - who wins? - the result is unambiguous. The models said Norway. Norway delivered. A 4-1 margin is exactly the kind of comfortable result that an 82-85% confidence band is meant to describe.
| Market | AI Consensus | Actual Result | Verdict |
|---|---|---|---|
| Match Winner | Norway (11 of 11 models) | Norway won 4-1 | ✔ Correct |
| Correct Score | No model predicted 1-4 | Iraq 1-4 Norway | ✘ Missed by all 11 |
Who was sharp, who was blind
On the winner call, there is no separating the field - everyone got it right, so everyone banks the result. But confidence calibration is where the nuance lives. GPT-4o Mini and the two Gemini models that posted 85% were rewarded for their conviction: they leaned hardest into Norway and Norway ran out comfortable winners. In a 4-1 result, the boldest forecasts look the smartest.
The lone relatively cautious voice, GPT-5 Mini at 75%, was correct but underpriced the outcome. A four-goal Norway win is the kind of scoreline that, in hindsight, justified the 85% crowd far more than the 75% one. That said, leaving room for variance is rarely a fireable offense - it is the model that is wrong with high confidence that gets punished on the leaderboard, and none of these were wrong on the headline market.
If you are hunting for a blind spot, it is not in the winner column at all. It is one level down.
The correct-score market: a clean miss for everyone
This is where the panel's apparent omniscience falls apart. Eleven correct-score guesses, zero points. Not a single model predicted Iraq 1-4 Norway, and the spread of guesses reveals two distinct failure modes.
The models that called the wrong winner on the scoreline
Several models, when asked for an exact score, contradicted their own winner pick. Claude Opus 4.6, Gemini 2.5 Flash and DeepSeek V3 all guessed 3-0 - to Iraq, the home side. Claude Haiku 4.5 and Claude Sonnet 4.6 went 2-0 the same way. In other words, having confidently picked Norway to win, their correct-score guesses described Iraq victories. That internal inconsistency is a genuinely interesting tell: the score-prediction reasoning and the winner-prediction reasoning are not always pulling from the same place.
The models that backed Norway but undershot the rout
The other camp stayed loyal to Norway in the scoreline but badly underestimated the margin. GPT-4o Mini, Gemini 2.5 Flash-Lite and Claude Opus 4.7 each went 0-2. Gemini 2.5 Pro and Grok 4 Fast went 0-3. All of these are Norway wins - directionally right - but all of them also predicted a clean sheet that never came, since Iraq scored.
The single closest read belonged to GPT-5 Mini, the same model that was most cautious on the winner. Its 1-2 guess was the only prediction that correctly anticipated Iraq scoring exactly once. It still landed on the wrong total for Norway and earned zero points, but in terms of capturing the actual shape of the game - Iraq on the board, Norway clear - it was the sharpest forecast in the room. A neat reminder that the most calibrated model on the binary market was also the most calibrated on the texture of the match.
The broader pattern: easy to call, hard to score
Iraq versus Norway is a textbook example of a recurring ModelFights theme. The frontier models are very good at the high-level question - who wins - especially in matches with a clear favorite, where an 11-of-11 consensus held up perfectly. They are far shakier at the granular one. A unanimous, correct winner call sat right alongside a unanimous, incorrect correct-score sheet.
That gap is the whole reason we grade in public with no hindsight edits. It is easy to remember that "the AIs called Norway" and forget that every exact-score guess missed, and that several models privately leaned toward an Iraq win when pressed for detail. Both facts are true, and both are on the record.
You can see the full grade for this fixture on the Iraq vs Norway match page, track how each model's calibration holds up over the tournament on the ModelFights leaderboard, and see what the panel is forecasting next in our latest predictions. The winner was never in doubt for the models. The scoreline, as ever, was a different sport entirely.