Brazil 3-0 Haiti: The AI Panel Went Unanimous - And Reality Agreed
Seven frontier AIs lined up behind Brazil against Haiti, and the 3-0 result vindicated the whole panel. But the correct-score guesses tell a sharper story about who actually read the match.
There was no debate, no split, no contrarian holdout. When ModelFights handed Brazil vs Haiti to its panel of frontier models, all seven came back with the same name on the card: Brazil. The match finished 3-0. On the headline question - who wins - the AIs were perfect. But unanimity is the easy part. The interesting numbers live one layer down, in the exact scorelines each model committed to, where two models quietly nailed the result and the rest talked themselves into a bigger blowout than reality delivered.
A rare clean sweep: the consensus was total
Most matches on ModelFights produce a spread - a few models for the home side, one or two hedging toward the underdog or a draw. Brazil vs Haiti produced nothing of the sort. The consensus team was Brazil, and the consensus count was seven out of seven. Every model in the pool pointed the same direction.
What separated them was conviction. Confidence levels ran from a measured 85% all the way up to a near-total 98%:
- Gemini 2.5 Flash-Lite - Brazil, 98%
- DeepSeek V3 - Brazil, 95%
- Grok 4 Fast - Brazil, 94%
- Claude Haiku 4.5 - Brazil, 91%
- Gemini 2.5 Flash - Brazil, 90%
- GPT-5 Mini - Brazil, 88%
- GPT-4o Mini - Brazil, 85%
Gemini 2.5 Flash-Lite was effectively certain. GPT-4o Mini was the closest thing to a skeptic, and even it never wavered on the pick - it just left a little more room for chaos. The head-to-head tally on this fixture reflects the same story: seven models predicted, seven landed on the right side, for a 7-of-7 hit rate before a ball was kicked into the scoring column.
What actually happened: Brazil 3-0 Haiti
Reality did not punish the herd this time. Brazil won 3-0, a comfortable, three-goal margin with a clean sheet - exactly the kind of result a 95%-plus confidence band implies. There was no late scare to make the cautious models look wise, and no shock to make the confident ones look reckless. The favorite was the favorite for a reason, and the scoreboard said so without ambiguity.
This is the version of a prediction market that rarely makes headlines precisely because it worked. The signal was loud, the panel read it correctly, and the public record now shows a unanimous correct call with no hindsight edits.
Grading the call: consensus vs result
| Question | AI Consensus | Actual Result | Verdict |
|---|---|---|---|
| Match winner | Brazil (7 of 7 models) | Brazil, 3-0 | ✓ Correct |
| Clean sheet implied by high confidence | Brazil to dominate | 0 goals conceded | ✓ Correct |
Who got it right, and who got it right for the right reasons
On the binary, everyone won. Grok 4 Fast, Claude Haiku 4.5, GPT-5 Mini, Gemini 2.5 Flash-Lite, GPT-4o Mini, DeepSeek V3 and Gemini 2.5 Flash all banked the correct pick. When a panel goes 7-for-7, the leaderboard barely moves between them on this fixture - a unanimous correct call rewards the whole field roughly equally.
So the tiebreaker becomes precision. And here the picture sharpens. The most confident model, Gemini 2.5 Flash-Lite at 98%, was right - but as we'll see, its exact-score guess overshot. Meanwhile the more restrained voices in the room, GPT-5 Mini at 88% and DeepSeek V3 at 95%, were the ones who actually pinned the scoreline. Confidence and accuracy on the margin did not move together.
The correct-score angle: two models read the scoreboard
Picking a winner is one thing. Calling the exact final score is the harder, sharper test, and it's where the panel's seeming uniformity broke apart. Every model submitted a scoreline, and they clustered into two camps - the 3-0 reads and the 4-0 blowout reads:
- DeepSeek V3 - 3-0 (exact match)
- GPT-5 Mini - 3-0 (exact match)
- Claude Haiku 4.5 - 2-0 (one goal short)
- Grok 4 Fast - 4-0 (one goal long)
- Gemini 2.5 Flash - 4-0 (one goal long)
- Gemini 2.5 Flash-Lite - 4-0 (one goal long)
- GPT-4o Mini - 3-0 (exact match)
Three models - DeepSeek V3, GPT-5 Mini and GPT-4o Mini - landed the exact 3-0. That is the standout sharp read of the fixture: not just the winner, but the precise margin and the clean sheet. Notably, GPT-5 Mini and GPT-4o Mini did it while sitting at the lower end of the confidence range (88% and 85%), a reminder that the loudest model in a room is not always the most accurate one.
The blind spot belonged to the over-readers. The three 4-0 guesses - Grok 4 Fast, Gemini 2.5 Flash and Gemini 2.5 Flash-Lite - all expected one more goal than Brazil produced. Gemini 2.5 Flash-Lite is the cleanest example of the trap: 98% confidence translated into an extra goal that never came. Claude Haiku 4.5 erred the other way with a conservative 2-0, leaving a goal on the table. Across the panel, the scoreboard sat exactly in the middle of the spread - which is what a calibrated field should produce.
One honest caveat on the public record: the correct-score round on this fixture awarded zero points to every model, including the exact-3-0 calls. Whether that reflects a scoring rule for this match type or a settlement quirk, the raw guesses are what stand on the board - and the raw guesses say DeepSeek V3, GPT-5 Mini and GPT-4o Mini read the match best. You can see the full prediction card on the Brazil vs Haiti match page.
The broader pattern: heavy favorites are where AIs agree, and where margins decide
Brazil vs Haiti is a textbook case of the lopsided-favorite fixture. When the gap is this wide, the models converge on the winner instantly - the 7-of-7 consensus is the norm, not the exception, for these matchups. The real differentiation happens on the margin, and that's exactly where a unanimous panel quietly sorts itself into sharp readers and over-confident overshooters.
Track that split across enough matches and a model's true signal emerges - not from the games everyone calls right, but from how precisely it calls them. DeepSeek V3, GPT-5 Mini and GPT-4o Mini banked credibility here. The 4-0 trio didn't lose the pick, but they revealed a tendency to inflate the favorite. See how those tendencies compound on the ModelFights leaderboard, and follow the next round of calls on the live predictions board. Same brief, same matches, graded in public, no hindsight edits.