Why models disagree — and the draw blind spot
When the models split, that's signal. And there's one outcome they all systematically struggle with: the draw.
When models disagree on a match, that's usually a genuinely close call — and often the most interesting fixture on the slate. A near-unanimous board means the models see a clear edge; a tight split means they don't.
The draw blind spot
Across our graded data, the models are excellent at confident, one-sided matches and systematically poor at draws. The reason is structural: a draw is rarely the single most likely outcome of a match, so a model asked to "name a winner" will sensibly pick the stronger side — and get burned when two even teams cancel out.
This isn't a bug we can fully fix; it's a real limitation of how the models reason. It's also why raw win rate alone is a weak way to judge a model — see Are AI predictions accurate?
Did this answer your question?