Prediction Market Accuracy: Backtest Polymarket vs Sportsbooks with Free Data
Everyone says prediction markets are more accurate than sportsbooks. Polymarket predicted the 2024 election better than polls. Kalshi beat the consensus on rate cuts. But is this true for sports? Are Polymarket’s odds actually sharper than Pinnacle’s?
You can’t answer that with anecdotes. You need hundreds of events, closing prices, actual outcomes, and a calibration curve. In this tutorial, you’ll build exactly that — a prediction market accuracy backtest using free historical data from OddsPapi.
Why Historical Data Matters
Accuracy claims without data are marketing. To actually test whether Polymarket or Pinnacle is better calibrated, you need:
- Closing odds from both sources on the same events
- Actual outcomes — who won, what the score was
- Enough events — at least 100+ to get meaningful calibration
Getting this data is the hard part. Polymarket’s historical prices live on-chain (slow to extract). Pinnacle’s API is closed to the public. Most odds APIs charge extra for historical data.
OddsPapi gives you historical odds for free on every tier — including prediction market exchanges.
Getting Historical Data: The Options
| Source | Historical Data | Cost | Speed |
|---|---|---|---|
| Polymarket on-chain | Needs blockchain indexing | Free (gas costs) | Slow (GraphQL + parsing) |
| The Odds API | Available on paid plans only | $79+/month | Fast |
| Scraping sportsbooks | Terms of service violation | Free (until you get banned) | Fragile |
| OddsPapi | ✅ Free on all tiers | Free | Fast (REST API) |
What We’re Building
A Python script that:
- Fetches completed fixtures from OddsPapi
- Gets historical closing odds from Polymarket and Pinnacle
- Gets actual results via the scores endpoint
- Builds a calibration dataset (predicted probability vs actual outcome)
- Calculates Brier scores for each source
- Plots a calibration curve showing which source is better calibrated
Step 1: Fetch Completed Fixtures
The fixtures endpoint supports date filtering. We’ll fetch recently completed soccer fixtures.
import requests
from datetime import datetime, timedelta
API_KEY = "YOUR_API_KEY"
BASE = "https://api.oddspapi.io/v4"
def fetch_completed_fixtures(sport_id=10, days_back=7):
"""Fetch completed fixtures from the last N days."""
end = datetime.utcnow()
start = end - timedelta(days=days_back)
params = {
"apiKey": API_KEY,
"sportId": sport_id,
"from": start.strftime("%Y-%m-%dT00:00:00Z"),
"to": end.strftime("%Y-%m-%dT23:59:59Z"),
"limit": 300
}
r = requests.get(f"{BASE}/fixtures", params=params, timeout=15)
r.raise_for_status()
# Filter to completed fixtures only (statusId 2 = completed)
return [f for f in r.json() if f.get("statusId") == 2]
fixtures = fetch_completed_fixtures(days_back=7)
print(f"Found {len(fixtures)} completed fixtures in last 7 days")
Step 2: Get Historical Odds (Closing Prices)
The /historical-odds endpoint returns tick-by-tick odds history for up to 3 bookmakers per call. The closing price is the last recorded price before the match started.
def get_closing_odds(fixture_id, bookmakers=["polymarket", "pinnacle"]):
"""Get closing (last pre-match) odds from historical endpoint."""
params = {
"apiKey": API_KEY,
"fixtureId": fixture_id,
"bookmakers": ",".join(bookmakers)
}
r = requests.get(f"{BASE}/historical-odds", params=params, timeout=15)
r.raise_for_status()
return r.json()
def extract_closing_price(hist_data, slug, market_id, outcome_id):
"""Extract the closing price from historical odds data."""
try:
book = hist_data["bookmakerOdds"][slug]
market = book["markets"][market_id]
outcome = market["outcomes"][outcome_id]
# Historical data is sorted newest-first
# Closing price = first entry (most recent before match)
snapshots = outcome["players"]["0"]
if snapshots:
return snapshots[0].get("price")
except (KeyError, IndexError):
return None
Step 3: Get Match Results
The /scores endpoint returns period-by-period scores for completed fixtures.
def get_result(fixture_id):
"""Get match result (fulltime score)."""
r = requests.get(f"{BASE}/scores",
params={"apiKey": API_KEY, "fixtureId": fixture_id},
timeout=15)
r.raise_for_status()
data = r.json()
periods = data.get("scores", {}).get("periods", {})
ft = periods.get("fulltime", periods.get("result", {}))
return {
"home_score": ft.get("participant1Score"),
"away_score": ft.get("participant2Score")
}
def determine_1x2_outcome(result):
"""Determine 1X2 outcome from score."""
h, a = result["home_score"], result["away_score"]
if h is None or a is None:
return None
if h > a:
return "101" # Home win
elif h == a:
return "102" # Draw
else:
return "103" # Away win
Step 4: Build the Calibration Dataset
Now we loop through completed fixtures, get closing prices from both Polymarket and Pinnacle, and record whether each predicted probability matched reality.
import pandas as pd
def build_calibration_data(fixtures, max_fixtures=200):
"""Build dataset: closing probability vs actual outcome."""
records = []
checked = 0
for fx in fixtures[:max_fixtures]:
fid = fx["fixtureId"]
name = f"{fx['participant1Name']} vs {fx['participant2Name']}"
# Get historical odds
try:
hist = get_closing_odds(fid, ["polymarket", "pinnacle"])
except Exception:
continue
# Check if polymarket data exists
if "polymarket" not in hist.get("bookmakerOdds", {}):
continue
# Get result
try:
result = get_result(fid)
actual_outcome = determine_1x2_outcome(result)
except Exception:
continue
if actual_outcome is None:
continue
checked += 1
# Extract closing prices for each outcome
for oid, label in [("101", "Home"), ("102", "Draw"), ("103", "Away")]:
poly_price = extract_closing_price(hist, "polymarket", "101", oid)
pin_price = extract_closing_price(hist, "pinnacle", "101", oid)
if poly_price and pin_price:
poly_prob = 1 / poly_price
pin_prob = 1 / pin_price
occurred = 1 if oid == actual_outcome else 0
records.append({
"match": name,
"outcome": label,
"poly_prob": round(poly_prob, 4),
"pin_prob": round(pin_prob, 4),
"occurred": occurred
})
print(f"Checked {checked} fixtures with Polymarket data")
return pd.DataFrame(records)
df = build_calibration_data(fixtures)
print(f"\n{len(df)} data points collected")
print(df.head(10))
Step 5: Calculate Brier Scores
The Brier score measures prediction accuracy. Lower is better. A perfect predictor scores 0.0, random guessing scores 0.25 on binary outcomes.
import numpy as np
def brier_score(probs, outcomes):
"""Calculate Brier score: mean squared error of predictions."""
return np.mean((np.array(probs) - np.array(outcomes)) ** 2)
if len(df) > 0:
poly_brier = brier_score(df["poly_prob"], df["occurred"])
pin_brier = brier_score(df["pin_prob"], df["occurred"])
print(f"\nBrier Scores (lower = better):")
print(f" Polymarket: {poly_brier:.4f}")
print(f" Pinnacle: {pin_brier:.4f}")
if poly_brier < pin_brier:
print(f"\n Polymarket is better calibrated by {(pin_brier - poly_brier):.4f}")
else:
print(f"\n Pinnacle is better calibrated by {(poly_brier - pin_brier):.4f}")
For context on what these numbers mean:
| Brier Score | Interpretation |
|---|---|
| 0.00 | Perfect prediction |
| 0.10 | Excellent — professional forecaster level |
| 0.15 | Good — sharp bookmaker level |
| 0.20 | Fair — typical soft bookmaker |
| 0.25 | Random guessing (coin flip) |
Step 6: Plot the Calibration Curve
A calibration curve shows how well-calibrated a predictor is. On a perfectly calibrated source, events predicted at 70% should happen 70% of the time. The diagonal line represents perfection.
import plotly.graph_objects as go
def calibration_curve(probs, outcomes, n_bins=10):
"""Calculate calibration curve data."""
bins = np.linspace(0, 1, n_bins + 1)
bin_centers = []
bin_actuals = []
for i in range(n_bins):
mask = (probs >= bins[i]) & (probs < bins[i + 1])
if mask.sum() >= 3: # Need at least 3 events per bin
bin_centers.append(np.mean(probs[mask]))
bin_actuals.append(np.mean(outcomes[mask]))
return bin_centers, bin_actuals
if len(df) > 0:
poly_x, poly_y = calibration_curve(
df["poly_prob"].values, df["occurred"].values)
pin_x, pin_y = calibration_curve(
df["pin_prob"].values, df["occurred"].values)
fig = go.Figure()
# Perfect calibration line
fig.add_trace(go.Scatter(
x=[0, 1], y=[0, 1], mode="lines",
line=dict(dash="dash", color="gray"),
name="Perfect Calibration"))
# Polymarket calibration
fig.add_trace(go.Scatter(
x=poly_x, y=poly_y, mode="lines+markers",
name=f"Polymarket (Brier: {poly_brier:.4f})",
line=dict(color="#6366f1", width=3),
marker=dict(size=10)))
# Pinnacle calibration
fig.add_trace(go.Scatter(
x=pin_x, y=pin_y, mode="lines+markers",
name=f"Pinnacle (Brier: {pin_brier:.4f})",
line=dict(color="#f59e0b", width=3),
marker=dict(size=10)))
fig.update_layout(
title="Calibration Curve: Polymarket vs Pinnacle",
xaxis_title="Predicted Probability",
yaxis_title="Actual Frequency",
template="plotly_dark",
height=500,
xaxis=dict(range=[0, 1]),
yaxis=dict(range=[0, 1]))
fig.show() # Opens in browser
# Or save: fig.write_html("calibration.html")
How to Interpret the Results
When you run this analysis, you'll see one of three things:
| Result | What It Means | Implication |
|---|---|---|
| Polymarket closer to diagonal | Crowd is better calibrated than sharps | Prediction markets add information beyond what bookmakers capture |
| Pinnacle closer to diagonal | Sharp bookmaker is better calibrated | Professional pricing beats crowd wisdom on sports events |
| Both similar | No significant accuracy difference | Use whichever has better liquidity for your use case |
Based on early data, Pinnacle tends to be better calibrated on sports events — which makes sense. Sports betting markets have decades of professional pricing infrastructure. Polymarket is newer and trades thinner on sports, which leads to wider spreads and more noise in the prices.
But the gaps between them — those are trading opportunities. If Pinnacle is better calibrated and Polymarket disagrees, the Polymarket price is potentially mispriced. That's exactly what the dashboard and terminal monitor from our other tutorials are designed to catch.
Extending the Analysis
- More data: Expand beyond 7 days. Run the script weekly and aggregate results into a growing dataset. With 500+ events, the calibration curve becomes much smoother.
- More bookmakers: Compare Kalshi, DraftKings, and Bet365 alongside Polymarket and Pinnacle. The historical-odds endpoint supports 3 bookmakers per call.
- Sport-specific analysis: Polymarket might be better calibrated on NBA (simpler binary outcome) vs soccer (three-way market). Test each sport separately.
- Time decay: Compare closing odds vs odds 24 hours before the match. Do prediction markets improve faster or slower than sportsbooks as kickoff approaches?
The Complete Backtest Script
Here's everything combined into one runnable script:
import requests, numpy as np, pandas as pd
import plotly.graph_objects as go
from datetime import datetime, timedelta
API_KEY = "YOUR_API_KEY"
BASE = "https://api.oddspapi.io/v4"
def fetch_completed(sport_id=10, days=7):
end = datetime.utcnow()
start = end - timedelta(days=days)
params = {"apiKey": API_KEY, "sportId": sport_id,
"from": start.strftime("%Y-%m-%dT00:00:00Z"),
"to": end.strftime("%Y-%m-%dT23:59:59Z"), "limit": 300}
r = requests.get(f"{BASE}/fixtures", params=params, timeout=15)
return [f for f in r.json() if f.get("statusId") == 2]
def get_hist(fid, books=["polymarket", "pinnacle"]):
r = requests.get(f"{BASE}/historical-odds",
params={"apiKey": API_KEY, "fixtureId": fid,
"bookmakers": ",".join(books)}, timeout=15)
return r.json()
def get_score(fid):
r = requests.get(f"{BASE}/scores",
params={"apiKey": API_KEY, "fixtureId": fid}, timeout=15)
p = r.json().get("scores", {}).get("periods", {})
ft = p.get("fulltime", p.get("result", {}))
h, a = ft.get("participant1Score"), ft.get("participant2Score")
if h is None or a is None:
return None
return "101" if h > a else ("102" if h == a else "103")
def closing_price(hist, slug, oid):
try:
snaps = hist["bookmakerOdds"][slug]["markets"]["101"]["outcomes"][oid]["players"]["0"]
return snaps[0]["price"] if snaps else None
except (KeyError, IndexError):
return None
# Build dataset
fixtures = fetch_completed(days=7)
print(f"Found {len(fixtures)} completed fixtures")
records = []
for fx in fixtures:
try:
hist = get_hist(fx["fixtureId"])
except Exception:
continue
if "polymarket" not in hist.get("bookmakerOdds", {}):
continue
outcome = get_score(fx["fixtureId"])
if not outcome:
continue
for oid, lbl in [("101", "Home"), ("102", "Draw"), ("103", "Away")]:
pp = closing_price(hist, "polymarket", oid)
pn = closing_price(hist, "pinnacle", oid)
if pp and pn and pp > 0 and pn > 0:
records.append({"poly": 1/pp, "pin": 1/pn,
"hit": 1 if oid == outcome else 0})
df = pd.DataFrame(records)
print(f"Collected {len(df)} data points")
if len(df) == 0:
print("No data — try increasing days or using a different sport")
exit()
# Brier scores
poly_brier = np.mean((df["poly"] - df["hit"]) ** 2)
pin_brier = np.mean((df["pin"] - df["hit"]) ** 2)
print(f"\nBrier Scores:")
print(f" Polymarket: {poly_brier:.4f}")
print(f" Pinnacle: {pin_brier:.4f}")
winner = "Polymarket" if poly_brier < pin_brier else "Pinnacle"
print(f" Winner: {winner}")
# Calibration curve
def cal_curve(probs, hits, bins=10):
edges = np.linspace(0, 1, bins + 1)
cx, cy = [], []
for i in range(bins):
m = (probs >= edges[i]) & (probs < edges[i+1])
if m.sum() >= 3:
cx.append(float(np.mean(probs[m])))
cy.append(float(np.mean(hits[m])))
return cx, cy
px, py = cal_curve(df["poly"].values, df["hit"].values)
nx, ny = cal_curve(df["pin"].values, df["hit"].values)
fig = go.Figure()
fig.add_trace(go.Scatter(x=[0,1], y=[0,1], mode="lines",
line=dict(dash="dash", color="gray"), name="Perfect"))
fig.add_trace(go.Scatter(x=px, y=py, mode="lines+markers",
name=f"Polymarket ({poly_brier:.4f})",
line=dict(color="#6366f1", width=3), marker=dict(size=10)))
fig.add_trace(go.Scatter(x=nx, y=ny, mode="lines+markers",
name=f"Pinnacle ({pin_brier:.4f})",
line=dict(color="#f59e0b", width=3), marker=dict(size=10)))
fig.update_layout(title="Calibration: Polymarket vs Pinnacle",
xaxis_title="Predicted Probability",
yaxis_title="Actual Frequency",
template="plotly_dark", height=500,
xaxis=dict(range=[0,1]), yaxis=dict(range=[0,1]))
fig.write_html("calibration.html")
print("\nCalibration chart saved to calibration.html")
Why Free Historical Data Changes Everything
This entire analysis runs on OddsPapi's free tier. No credit card, no enterprise contract, no blockchain indexing. Just an API key and a Python script.
Most competitors charge $79+/month for historical data. OddsPapi gives it away because historical data drives adoption — developers who backtest models end up building live trading systems that need real-time feeds.
The free tier covers 350+ bookmakers including prediction market exchanges (Polymarket, Kalshi, ProphetX) and sharp books (Pinnacle, Singbet, SBOBet). That's the same data hedge funds pay five figures for, accessible to anyone with a Python script.
Run This Analysis Yourself
You just learned how to quantify the question "are prediction markets more accurate than sportsbooks?" with real data. The answer varies by sport, by time period, and by market type — which is exactly why you need to run the analysis yourself, not trust someone else's claims.
Get your free API key and run the backtest. If you want to monitor live divergences instead of backtesting historical ones, check out our prediction market dashboard or CLI terminal monitor.
For more on how Polymarket and Kalshi data works within OddsPapi, see our Polymarket & Kalshi API guide.