We Graded 11 Frontier AIs on the World Cup's Latest Results. They Aced the Blowouts and Whiffed on Every Draw.
The favorites? The AI panel was near-perfect. The draws? All eleven models struck out at once. Here's the graded scorecard from the World Cup's latest slate — USA, Scotland, Mexico, Belgium, Saudi Arabia and more.
The World Cup is the cruelest test we run. There is no “model” for a 90-minute football match between two national teams who meet once every few years — only form, line-ups, and the cold market line. So every time the group stage hands us a fresh batch of results, we do the same thing: settle every pick, no edits, and see how the frontier models actually did.
The latest slate split the AI panel cleanly in two. On the lopsided matches, the models were close to flawless. On the draws, every single one of them struck out at the same time. Here is the graded scorecard.
The blowouts: the AIs were near-perfect
When the talent gap is obvious, frontier AIs read it like a chart. Three of the uncovered matches were decisive, and the panel was nearly unanimous and completely right.
- USA 4–1 Paraguay: all 11 of 11 models picked the USA, and the hosts duly ran out four-goal winners. A clean sweep.
- Haiti 0–1 Scotland: another 11 of 11 — every model took Scotland, and Scotland won it 1–0 away from home.
- Mexico 2–0 South Africa: 4 of 4 models that called it took Mexico. 2–0, no debate.
That is the part of the job AI does well. Given the same brief every model receives — teams, recent form, the market price — the panel separates a genuine favorite from an over-matched opponent with almost no disagreement. On clear games, the “wisdom of the lineup” is real.
Then the World Cup served up a draw festival
And then the tournament reminded everyone why international football is the graveyard of prediction. Three matches on the slate ended level — and on all three, the entire AI panel was wrong at once.
- Belgium 1–1 Egypt: 0 of 11 models called it. Every one took a side; Egypt held Belgium level.
- Saudi Arabia 1–1 Uruguay: another 0 of 11. The panel leaned to the favorite and watched Saudi Arabia dig out a draw.
- Iran 2–2 New Zealand: 0 of 11 again, in a four-goal stalemate nobody on the panel saw coming.
Three matches, thirty-three picks, zero hits. That is not bad luck — it is a structural blind spot, and it is the most interesting thing in the data.
Why draws break AI prediction
The draw is the hardest call in football, and frontier models almost never make it. Ask a model to pick a winner and it will, sensibly, name the stronger side. But a draw is rarely the most likely single outcome — it is just the outcome that becomes very common once two evenly matched international sides cancel each other out. Across a full slate, “always back a team” is a losing strategy precisely when the games tighten up.
You can see both halves of the blind spot in one match. In South Korea 2–1 Czech Republic, three of the four models actually did hedge to a draw — at a nervy 37% average confidence — and the game refused to cooperate, finishing 2–1. So the panel misses real draws by backing a side, and occasionally talks itself into a draw that never arrives. Either way, the level scoreline is the one result AI has not learned to respect.
The favorites? Near-perfect. The draws? Zero for three. That gap — not the headline win rate — is where the real story of AI sports prediction lives.
What it means for the leaderboard
This is exactly why our public leaderboard doesn’t lead with raw win rate. A model can ace every blowout and still land near a coin flip over a tournament, because the draws drag everyone back to the pack. The honest signal isn’t “who picked the most winners” — it’s calibration and closing-line value over hundreds of graded picks, which is what we actually rank on.
Every pick above was timestamped before kickoff, hashed, and graded automatically the moment the result landed — no hindsight, no quiet edits. You can read exactly how we do it on the methodology page, or watch the next batch settle live on the predictions slate.
The blowouts will keep coming, and the AIs will keep nailing them. The draws are the test. So far, the World Cup is winning.