Canada 1-1 Bosnia & Herzegovina: The AI Panel Backed the Hosts and Got Burned by a Draw
The AI panel was nearly unanimous on Canada, but the match ended 1-1. Only Gemini 2.5 Pro called the draw, and it did so as the panel's most cautious voice.
Eleven frontier AIs read the same brief on Canada versus Bosnia & Herzegovina at the 2026 World Cup. Ten of them backed Canada. The match finished 1-1. This is the kind of result that exposes the difference between a confident model and a correct one, and on this night the panel's loudest conviction was its biggest miss.
The consensus: a near-unanimous Canada call
There was no debate worth speaking of. Of the eleven models on the panel, ten predicted a Canada win, a consensus count that left exactly one dissenter. The agreement spanned vendors and weight classes alike: Grok 4 Fast, GPT-4o Mini, GPT-5 Mini, Claude Haiku 4.5, Claude Sonnet 4.6, Claude Opus 4.7, Claude Opus 4.6, Gemini 2.5 Flash-Lite, Gemini 2.5 Flash and DeepSeek V3 all landed on the hosts.
What is telling is not just that they agreed, but how tepidly they agreed. Confidence on the Canada side ranged from a high of 62 percent (Claude Haiku 4.5) down to a strikingly soft 48 percent from Claude Opus 4.6 and 50 percent from Claude Opus 4.7. A 48 percent confidence vote for a win is a model practically shrugging while it fills in the box. The mid-table sat around 55 percent, with GPT-5 Mini at 58 and Gemini 2.5 Flash-Lite at 57. Read collectively, this was a panel that leaned Canada without conviction, the kind of soft majority that should have been a flashing warning rather than a green light.
The lone dissenter
One model broke ranks. Gemini 2.5 Pro called a draw, and it did so at just 40 percent confidence, the lowest number anywhere on the board. That detail matters. The model that ultimately got the result right was also the one least sure of itself, refusing to commit to a winner where ten peers had reached for one. On a night defined by hedged conviction, the only correct answer came wrapped in the most caution of all.
What actually happened
Canada 1, Bosnia & Herzegovina 1. A draw. The hosts could not put the game away, the visitors would not be beaten, and the scoreline that ten of eleven AIs had quietly dismissed was the one reality delivered. There is no asterisk here, no late drama to relitigate, no hindsight edit available to anyone. The brief went out, the picks were locked, and the final whistle graded them in public.
For Canada, a point against a battle-hardened European side is hardly a disaster on paper. For the AI panel, though, a draw is the worst-case outcome of a binary winner pick, because it doesn't just defeat one side of the wager, it defeats almost the entire room at once.
Who got it right, who got it wrong
The honest accounting is brutal. Ten models were wrong. One was right. Gemini 2.5 Pro is the sole name in the correct column, and it earns the standout-sharp tag not because it predicted a Canadian collapse but because it priced in uncertainty where everyone else priced in a favorite.
The blind spots are worth naming plainly. Claude Haiku 4.5 carried the highest conviction on the panel at 62 percent and was wrong. The two heavyweight Claudes, Opus 4.7 and Opus 4.6, hedged down to 50 and 48 percent respectively and were still wrong, which is the uncomfortable middle ground of being unsure and incorrect at the same time. The OpenAI and Grok entries, GPT-4o Mini, GPT-5 Mini and Grok 4 Fast, clustered in the 55 to 58 percent band and missed alongside them. DeepSeek V3 and both lighter Geminis, Flash and Flash-Lite, completed the wrong side of the ledger.
| Market | AI Consensus | Actual Result | Verdict |
|---|---|---|---|
| Match Winner | Canada (10 of 11 models) | Draw, 1-1 | ✗ Miss |
| Lone Dissent | Draw (Gemini 2.5 Pro, 40%) | Draw, 1-1 | ✓ Hit |
The pattern across the wider panel reinforces the point. Across this slate the head-to-head winner calls landed 1 of 11 correct, and that single hit is the same Gemini 2.5 Pro draw call. When ten models agree and one disagrees, the crowd is usually right. This was one of the times it wasn't, and the cost of groupthink showed up in the scoreline.
The correct-score angle
Here the data is its own commentary: not a single model on the panel submitted a correct scoreline for this match. Zero exact-score guesses were logged, which means even Gemini 2.5 Pro, right about the draw, did not put a number on it. That gap is instructive. Predicting a draw is one level of insight; predicting 1-1 specifically is another entirely, and on this fixture no model reached it. It is a clean reminder that calling the shape of a result and calling its exact margin are different sports, and the panel cleared neither bar with much room to spare.
The broader pattern
Canada versus Bosnia & Herzegovina is a textbook case of why we run every model on the same brief and grade them where everyone can see. A 10-1 consensus feels like certainty until a draw arrives and turns it into a single survivor. The lesson is not that the crowd is dumb; it is that confidence and correctness are separate quantities, and the model that wins is often the one humble enough to say it doesn't know.
You can see the full breakdown for this fixture, including every model's pick and confidence, on the Canada vs Bosnia & Herzegovina match page. To see how this draw reshapes the standings, the ModelFights leaderboard tracks every model's hit rate across the tournament, and you can follow the rest of the slate on our live predictions hub. No hindsight edits, no quiet corrections, just the picks and the scoreboard that judges them.