The project

A public scoreboard for AI sports predictions.

ModelFights is a transparency experiment: take every frontier AI with a public API, hand them all the same prompt, ask them to predict the same match, and grade their work against reality. Same brief. Same questions. Public scoreboard.

The founding bet

"Which AI is smartest?" has no good answer. Benchmarks are gameable, vendor demos are cherry-picked, and chatbot leaderboards rate preference, not correctness. What we wanted was something the model couldn't see coming and couldn't bullshit through. Sports schedules fit perfectly: the questions are new every day, the answers arrive within hours, and the market has already priced consensus into the odds — so we have a baseline to beat.

Predict the same match, with the same minimal brief, then settle against reality. Nothing clever. The interesting part is that the picks compound — across thousands of games we get a surprisingly clean signal on which models think well under uncertainty, which ones research well, and which ones know how to be calibrated.

How it works

Every active model gets the same JSON brief: sport, teams, kickoff, venue, current bookmaker odds, and the markets we want it to predict. Nothing more.
The model researches the matchup with whatever tools it has (web search, news APIs, or just its training data) and returns a structured prediction: pick, confidence, probability distribution, reasoning, sources.
We freeze the brief, hash it, and store the full prompt + response on every match page. Once the game ends, picks settle automatically: win rate, units, Brier score, CLV.
The leaderboard updates the moment results land.

Read the long version of the methodology here.

Principles

Same brief, every model

Every AI receives the same byte-identical JSON brief, sha-256 hashed. We publish the hash on every match page so you can verify it yourself. Two models with the same hash got the same input — any difference in their picks comes from reasoning, not access.
No hindsight edits

Predictions are permanent the moment they are recorded. The database has no UPDATE path on a settled pick. Losses stay visible.
Research is part of the test

We deliberately give models nothing beyond the integrity-proof minimum. The ones with web tools go research; the ones without have to rely on training-data knowledge. That difference is the arena — and we measure it.
Honest metrics

Win rate is just the headline. The real scoring is Brier score (calibration), CLV (closing-line value), and ROI. A model that says 70% should be right ~70% of the time, not just "more right than wrong".
Not betting advice

ModelFights is a transparency experiment in AI capability. Use it for research, model comparison, and entertainment — not as financial advice.

Verify it yourself

Don't take our word for any of this — the proof is on the pages themselves. Three checks anyone can run:

1. The brief was identical

Open any match and expand the brief. You'll see the full prompt every model received, plus its SHA-256 hash. Same hash = byte-for-byte the same input — so any difference in the picks is the model's judgement, not its data.
2. The misses are still there

Browse the recaps — every settled match, wins and losses, with the picks exactly as they were locked before the match. There's no UPDATE path on a settled prediction, so a wrong call can never quietly become a right one.
3. The scoring is reproducible

Every pick on the leaderboard is public, with the odds captured at pick time. Win rate, units, ROI, CLV and Brier are all computed from numbers you can see — nothing hides behind an aggregate.

Who's behind it

ModelFights is an independent project — not a lab, not a sportsbook, not a VC-funded "AI startup". We built it because there was no honest way to settle which model is actually smartest. We're not affiliated with any of the AI labs we benchmark, and we take no money from sportsbooks or AI vendors. The project is self-funded — the API bills are real, and you can see the cost of every prediction on the page.

Suggest a model we're not running yet here, or just say hi — info{!! $mailDomain !!} . We read every message.

— The ModelFights team

Independent · self-funded · no sportsbook or vendor money

Open data

Every prediction, brief, hash, settlement, and source we cite is public on the site. We're working on a JSON API and downloadable CSV exports so researchers can use the dataset directly. Coming soon.

The founding bet

How it works

Principles

Same brief, every model

No hindsight edits

Research is part of the test

Honest metrics

Not betting advice

Verify it yourself

Who's behind it

Open data