⚠️ A research & education experiment — NOT gambling and NOT an encouragement to gamble. We hold no positions and invest no real capital. Every figure is a paper simulation of market structure, not investment returns, advice, or a solicitation. Prediction-market platforms like Polymarket are restricted or banned in some jurisdictions (e.g. Singapore) — know your local laws.
Beat the market at the World Cup using zero football knowledge.
We never forecast a match, a player, or a team. We take the market's own prices and ask a narrower, more honest question: are those prices internally consistent and unbiased — and if not, can we harvest the inconsistency? The entire edge, if any, comes from the structure of the prices, not from knowing anything about football.
Then we keep score in public. Because these markets resolve, every call is timestamped and graded out-of-sample against what actually happened. The point isn't to look clever — it's to find out, on the record, whether a purely structural strategy can beat a liquid crowd.
A "bounded" market is one that settles to a known value — here, 0 or 1 (a team advances, or it doesn't). That boundedness is exactly what makes zero-knowledge strategies viable, in four ways an ordinary stock doesn't offer:
Our single market data source is Polymarket. We read its live, real-money World Cup prices through the public Gamma API — no private feeds, no broker. Those prices are the "world" / market side of every comparison on the board; the crowd's money is our benchmark, and we link straight out to the underlying Polymarket market on every team. (We are not affiliated with Polymarket, and — see the banner — it is restricted in some jurisdictions; the links are reference-only.) Because everything keys off one transparent, reproducible price source, there is no separate "data source" tab to maintain: it is Polymarket, end to end.
Polymarket lists the World Cup as several separately-traded events. Per team they form a nested ladder of "how far do you go," each level with a known number of slots:
| Round | Slots (prices must sum to) |
|---|---|
| Advance to knockouts | 32 |
| Reach quarter-final | 8 |
| Reach semi-final | 4 |
| Reach final | 2 |
| Win the Cup | 1 |
≈240 markets vs 48 winner-only — 5× the breadth, and where the cleanest structural tells live.
Three independent structural sources, none of which require football knowledge:
p → pᵖᵒʷᵉʳ (renormalized). On a planted-bias test this beats the de-vigged market by ~0.7% Brier. It's small — but it's robust and directional: it buys favorites and fades longshots, the historically profitable side of the bias. Why it could work: the bias is behavioral (longshot lottery-love + margin structure) and has persisted for decades; we exploit the distribution shape, which is far more stable than any single price.We measure all of this against the market's own price as the baseline — skill-vs-market, the only comparison that matters. Beating a coin flip is meaningless; beating the de-vigged crowd is the whole game.
A fixed $1,000 paper bankroll per book, conviction-weighted (stake ∝ |edge|, capped per trade), dollar-neutral, net of the half-spread. Each ticket holds signed YES-shares: unrealized = shares × (price − entry), realized = shares × (outcome − entry). Every book is frozen at day 0 into the ledger (ledger/wc_*.jsonl) and marked to live, so each shows real running Entry → Now → PnL — not a re-sized proposal.
Two trading styles, run identically for both models so the comparison is symmetric:
wc-core) — entered once at day 0, every ticket held to its market's resolution. The clean test of "was the pre-tournament view right?" PnL settles in waves (advance → QF → SF → final → win).wc-live) — rebalanced each matchday: settle what resolved, close the rest at the current price, redeploy the compounding capital into the freshest edges. The test of "does acting on the model's updates beat just holding?" Its matchday changelog is the timeline.Crossed with the two models that produce the edges — the zero-knowledge structural model and the informed Elo model (§5b) — that is four books, each its own tab with its own PnL: Buy & Hold, Active, Elo · Buy & Hold, Elo · Active. Comparing across the grid is itself the finding: if Active doesn't beat Buy & Hold net of churn the daily updates are noise; if the Elo books don't beat the zero-knowledge ones, football knowledge didn't help — and we'd report either honestly.
The structural book above is derived from the market, so by construction it can only disagree with the crowd a little. To get a genuinely independent view we run the engine's own simulation — Elo + Poisson goals over the verified 2026 bracket — on real per-team World Football Elo ratings (eloratings.net, dated, reproducible), not seeded from Polymarket.
Four disclosed choices make it honest (all priors from football, not tuned to the market):
What it finds (live): a clean results-vs-reputation split — the Elo model rates Spain, Argentina and the in-form South-Americans (Colombia, Ecuador, Uruguay) higher than the market, and fades the reputation teams (France, England, Portugal, Germany) whose results have lagged their squad value. That's a real, defensible disagreement.
How the Elo model sizes and places its bets. The Elo book is sized exactly like the structural book — conviction-weighted and dollar-neutral: long the markets the model thinks are under-priced, short the over-priced, each side getting half a fixed $1,000 paper bankroll, stake ∝ |edge| and capped per trade so no single bet dominates. The crucial difference is where the edge comes from:
Which of its bets to trust: the win-level disagreements (results-vs-reputation) are the defensible part; the advance-level bets are larger but less reliable — the market usually knows a team's specific group context better than a single backward-looking Elo number.
The honest caveat, stated on the board itself: a model built on public ratings is usually less sharp than a liquid, heavily-traded market — so a big gap is more likely the model being cruder than the crowd being wrong. Big disagreement ≠ edge. Both books are scored by the same live scorecard; that's what adjudicates. (src/worldcup_fundamental.py.)
The most-likely outcome map. The same 20k simulations also drive a projection on the board: the projected group stage (each team's most-likely finishing rank + advance %, from the group finishing-position distribution) and a knockout pyramid (the most-likely Quarter-finalists → Semi-finalists → Finalists → Champion, read off the reach-round probabilities). These are the model's probabilities, shown in its violet colour — a picture of what the informed model expects, not a market consensus.
We are deliberately loud about the limitations — the credibility is in the caveats.
What's missing / weak today
capacity.py analysis puts the break-even capacity near \$0 — the spread alone can exceed the gross daily edge. We now net a flat ~1¢ half-spread off every book edge (sub-spread gaps are sized to zero), so the paper books are no longer purely gross. This is still a flat approximation — real per-market spreads vary and a full slippage/impact model isn't wired in — but it removes the worst of the gross-edge illusion.IR ≈ IC·√K (which assumes K independent signals) would badly overstate the information ratio. We present the rows as one correlated position, not as breadth.How we'd improve it
bounded.py::fit_recalibration) per round, and weight by time-to-resolution (IC rises as τ→0).ledger/wc_results.json; pre-tournament, with no results, it is the day-0 forecast). The Dixon–Coles goals correction is also in. The remaining bracket gap is the exact official slot table (incl. best-third contingencies); we currently use a balanced seeded approximation.src/worldcup_fundamental.py, §5b) — shown as a separate, clearly-labelled section so readers see structure-only vs Elo side by side. The shrink is now knockout-only (the group stage trusts raw Elo). Next for it: real form/injury adjustments to the ratings (so its advance-level disagreements are better-founded).A compact, reproducible reference. Prices are probabilities \(p \in (0,1)\); each round has \(S\) slots (advance 32, QF 8, SF 4, final 2, win 1). Everything is plain arithmetic over a handful of disclosed priors — none fit to the market: the favorite–longshot \(\alpha\), the Elo shrinks \(\kappa\), the host bonus, the rating-uncertainty \(\sigma\), the Dixon–Coles \(\rho\), and the cost half-spread \(c\).
De-vig & overround. Raw prices sum to more than the slots (the bookmaker's margin); we rescale each level proportionally to its slot count:
$$q_i = \frac{p_i}{\sum_j p_j}\,S, \qquad \text{overround} = \frac{\sum_j p_j}{S} - 1 .$$
Zero-knowledge model — favorite–longshot correction. A power transform of the shape, renormalised back to the slots, with \(\alpha = 1.15\):
$$\text{model}_i = \frac{p_i^{\alpha}}{\sum_j p_j^{\alpha}}\,S, \qquad \text{edge}_i = \text{model}_i - q_i .$$
\(\alpha > 1\) lifts favorites and trims longshots — the historically profitable direction of the bias.
No-arbitrage (nested). For each team the deeper rounds can't exceed the shallower ones:
$$P(\text{win}) \le P(\text{final}) \le P(\text{SF}) \le P(\text{QF}) \le P(\text{advance}) .$$
Any \(P(\text{deeper}) > P(\text{shallower}) + \text{tol}\) is a riskless mispricing.
Cost (half-spread) netting. Every edge shown in a book is net of a flat half-spread \(c = 0.01\) (≈ 1 cent, a typical Polymarket tick):
$$\text{edge}_{\text{net}} = \operatorname{sign}(\text{edge})\cdot\max\!\big(|\text{edge}| - c,\ 0\big).$$
A gap that doesn't clear the cost to trade it is sized to zero — we only act on edges that survive the spread. (The model-vs-market comparison table still shows the gross disagreement; netting applies where money is at stake.)
Sizing — the paper book. Conviction-weighted and dollar-neutral, on the net edge. Each side (long/short) gets half a bankroll \(B\), allocated by \(|\text{edge}|\) and capped per trade:
$$\text{stake}_i = \min\!\left(\frac{B}{2}\cdot\frac{|\text{edge}_i|}{\sum_j|\text{edge}_j|},\ \ \text{cap}\right), \quad \text{cap} = \tfrac{B}{2}\cdot 0.15 ,$$
$$\text{shares}_i = \pm\,\frac{\text{stake}_i}{\max(\kappa_i,\ \varepsilon)}, \quad \kappa_i = p_i\ (\text{long})\ \text{ or }\ 1-p_i\ (\text{short}).$$
Mark-to-market \(\text{PnL} = \text{shares}\cdot(\text{price}_{\text{now}} - \text{entry})\); realised uses the settled outcome \(o \in \{0,1\}\). A binary bet can't lose more than its stake: \(\max\!\downarrow = -\,\text{stake}\), \(\max\!\uparrow = |\text{shares}| - \text{stake}\).
The informed (Elo) model. Match win probability (Elo expected score), with a home bonus \(h\), and the rating update from a result (\(S \in \{1,\tfrac12,0\}\), \(K = 24\)):
$$E = \frac{1}{1 + 10^{-(R_A - R_B + h)/400}}, \qquad R \leftarrow R + K\,\ln\!\big(|\Delta\text{goals}| + 1\big)\,(S - E).$$
Group-stage scoreline — a Dixon–Coles-corrected double Poisson (the \(\rho\) term lifts the low-score draws that independent Poisson under-produces, giving a realistic \(\sim 27\%\) draw rate):
$$g_A \sim \text{Poisson}(\lambda_A),\quad \lambda_A = \mu\,e^{+b(R_A - R_B + h)},\quad \lambda_B = \mu\,e^{-b(R_A - R_B + h)},$$
with \(\mu = 1.35\), \(b = 0.0028\), \(\rho = -0.05\); the DC factor \(\tau_\rho\) reweights the \((0\text{-}0,\ 1\text{-}0,\ 0\text{-}1,\ 1\text{-}1)\) cells.
Disclosed priors on the ratings: a host bonus of \(+60\) Elo for the three 2026 co-hosts (added before the shrink), and a knockout-only shrink toward the field mean \(\bar{R}\):
$$R' = \bar{R} + \kappa\,(R - \bar{R}), \qquad \kappa_{\text{group}} = 1.0,\quad \kappa_{\text{KO}} = 0.6.$$
Rating uncertainty. Each of the \(N\) simulations perturbs every team's rating by \(\varepsilon \sim \mathcal{N}(0,\sigma^2)\), \(\sigma = 70\) Elo (the same \(\varepsilon\) for a team's group and knockout ratings), integrating over our uncertainty about true strength — so a favorite reads \(\sim 99\%\), not a false 100%, to advance.
Bracket. Qualifiers are seeded into a balanced bracket (winners take the protected top seeds, then runners-up, then best-thirds, by rating within each tier), so winners are kept apart and same-group teams don't meet in the Round of 32 — a structural approximation, not the exact official slot table.
Monte-Carlo reach probabilities. Over \(N = 20{,}000\) sims, \(P(\text{level}) = \text{count}/N\). The Monte-Carlo error — how precisely the sim estimates its own answer — is
$$\mathrm{SE} \approx \sqrt{\frac{p(1-p)}{N}} \approx 0.2\text{–}0.3\%.$$
That is not the forecast's error bar: the real uncertainty is dominated by the inputs (the Elo ratings, the host bonus, the bracket path, and everything the model can't see — form, injuries, motivation), which is far larger. So a big model-vs-market gap is not "signal because it clears the MC noise"; treat it as a disagreement the scorecard adjudicates, not a proven edge. The rating-uncertainty band is a first, partial attempt to reflect this.
Scoring (the public scorecard). Out-of-sample, always vs the market — lower Brier is better, positive skill means the model beat the crowd:
$$\text{Brier} = \frac{1}{n}\sum_i (p_i - o_i)^2, \qquad \text{skill} = \text{Brier}_{\text{market}} - \text{Brier}_{\text{model}} .$$
| Piece | Module |
|---|---|
| De-vig / favorite–longshot / no-arb detectors | src/consistency.py |
| Full nested ladder + per-level overround + arb scan | src/worldcup_markets.py |
| Paper positions, mark-to-live, resolve PnL | src/worldcup_positions.py |
| Conviction-weighted sizing + the HTML board | src/worldcup_board.py |
| Two books — Buy & Hold vs Active Trading | src/worldcup_books.py |
| Bounded-market calibration / time-to-resolution signals | src/bounded.py |
| Cost / capacity analysis | src/capacity.py |
| Pre-registered, timestamped hypothesis | docs/preregistration-worldcup.md |