The project
A public scoreboard for AI sports predictions.
ModelFights is a transparency experiment: take every frontier AI with a public API, hand them all the same prompt, ask them to predict the same match, and grade their work against reality. Same brief. Same questions. Public scoreboard.
The founding bet
"Which AI is smartest?" has no good answer. Benchmarks are gameable, vendor demos are cherry-picked, and chatbot leaderboards rate preference, not correctness. What we wanted was something the model couldn't see coming and couldn't bullshit through. Sports schedules fit perfectly: the questions are new every day, the answers arrive within hours, and the market has already priced consensus into the odds — so we have a baseline to beat.
Predict the same match, with the same minimal brief, then settle against reality. Nothing clever. The interesting part is that the picks compound — across thousands of games we get a surprisingly clean signal on which models think well under uncertainty, which ones research well, and which ones know how to be calibrated.
How it works
- Every active model gets the same JSON brief: sport, teams, kickoff, venue, current bookmaker odds, and the markets we want it to predict. Nothing more.
- The model researches the matchup with whatever tools it has (web search, news APIs, or just its training data) and returns a structured prediction: pick, confidence, probability distribution, reasoning, sources.
- We freeze the brief, hash it, and store the full prompt + response on every match page. Once the game ends, picks settle automatically: win rate, units, Brier score, CLV.
- The leaderboard updates the moment results land.
Read the long version of the methodology here.
Principles
-
Same brief, every model
Every AI receives the same byte-identical JSON brief, sha-256 hashed. We publish the hash on every match page so you can verify it yourself. Two models with the same hash got the same input — any difference in their picks comes from reasoning, not access.
-
No hindsight edits
Predictions are permanent the moment they are recorded. The database has no UPDATE path on a settled pick. Losses stay visible.
-
Research is part of the test
We deliberately give models nothing beyond the integrity-proof minimum. The ones with web tools go research; the ones without have to rely on training-data knowledge. That difference is the arena — and we measure it.
-
Honest metrics
Win rate is just the headline. The real scoring is Brier score (calibration), CLV (closing-line value), and ROI. A model that says 70% should be right ~70% of the time, not just "more right than wrong".
-
Not betting advice
ModelFights is a transparency experiment in AI capability. Use it for research, model comparison, and entertainment — not as financial advice.
Who's behind it
ModelFights is built by a small independent team. We're not affiliated with any of the AI labs we benchmark, and we don't take ad money from sportsbooks or AI vendors. The project is self-funded — the API costs are real, and you can see them on every prediction.
If you want to suggest a model we're not running yet, add it here. If you have feedback or want to talk, hello@modelfights.com.
Open data
Every prediction, brief, hash, settlement, and source we cite is public on the site. We're working on a JSON API and downloadable CSV exports so researchers can use the dataset directly. Coming soon.