Prediction Market Accuracy: Backtest Polymarket vs Sportsbooks with Free Data

Prediction Market Accuracy - OddsPapi API Blog

Historical Odds March 16, 2026

Everyone says prediction markets are more accurate than sportsbooks. Polymarket predicted the 2024 election better than polls. Kalshi beat the consensus on rate cuts. But is this true for sports? Are Polymarket’s odds actually sharper than Pinnacle’s?

You can’t answer that with anecdotes. You need hundreds of events, closing prices, actual outcomes, and a calibration curve. In this tutorial, you’ll build exactly that — a prediction market accuracy backtest using free historical data from OddsPapi.

Why Historical Data Matters

Accuracy claims without data are marketing. To actually test whether Polymarket or Pinnacle is better calibrated, you need:

Closing odds from both sources on the same events
Actual outcomes — who won, what the score was
Enough events — at least 100+ to get meaningful calibration

Getting this data is the hard part. Polymarket’s historical prices live on-chain (slow to extract). Pinnacle’s API is closed to the public. Most odds APIs charge extra for historical data.

OddsPapi gives you historical odds for free on every tier — including prediction market exchanges.

Getting Historical Data: The Options

Source	Historical Data	Cost	Speed
Polymarket on-chain	Needs blockchain indexing	Free (gas costs)	Slow (GraphQL + parsing)
The Odds API	Available on paid plans only	$79+/month	Fast
Scraping sportsbooks	Terms of service violation	Free (until you get banned)	Fragile
OddsPapi	✅ Free on all tiers	Free	Fast (REST API)

What We’re Building

A Python script that:

Fetches completed fixtures from OddsPapi
Gets historical closing odds from Polymarket and Pinnacle
Gets actual results via the scores endpoint
Builds a calibration dataset (predicted probability vs actual outcome)
Calculates Brier scores for each source
Plots a calibration curve showing which source is better calibrated

Step 1: Fetch Completed Fixtures

The fixtures endpoint supports date filtering. We’ll fetch recently completed soccer fixtures.

import requests
from datetime import datetime, timedelta

API_KEY = "YOUR_API_KEY"
BASE = "https://api.oddspapi.io/v4"

def fetch_completed_fixtures(sport_id=10, days_back=7):
    """Fetch completed fixtures from the last N days."""
    end = datetime.utcnow()
    start = end - timedelta(days=days_back)

    params = {
        "apiKey": API_KEY,
        "sportId": sport_id,
        "from": start.strftime("%Y-%m-%dT00:00:00Z"),
        "to": end.strftime("%Y-%m-%dT23:59:59Z"),
        "limit": 300
    }
    r = requests.get(f"{BASE}/fixtures", params=params, timeout=15)
    r.raise_for_status()

    # Filter to completed fixtures only (statusId 2 = completed)
    return [f for f in r.json() if f.get("statusId") == 2]

fixtures = fetch_completed_fixtures(days_back=7)
print(f"Found {len(fixtures)} completed fixtures in last 7 days")

Step 2: Get Historical Odds (Closing Prices)

The /historical-odds endpoint returns tick-by-tick odds history for up to 3 bookmakers per call. The closing price is the last recorded price before the match started.

def get_closing_odds(fixture_id, bookmakers=["polymarket", "pinnacle"]):
    """Get closing (last pre-match) odds from historical endpoint."""
    params = {
        "apiKey": API_KEY,
        "fixtureId": fixture_id,
        "bookmakers": ",".join(bookmakers)
    }
    r = requests.get(f"{BASE}/historical-odds", params=params, timeout=15)
    r.raise_for_status()
    return r.json()

def extract_closing_price(hist_data, slug, market_id, outcome_id):
    """Extract the closing price from historical odds data."""
    try:
        book = hist_data["bookmakerOdds"][slug]
        market = book["markets"][market_id]
        outcome = market["outcomes"][outcome_id]
        # Historical data is sorted newest-first
        # Closing price = first entry (most recent before match)
        snapshots = outcome["players"]["0"]
        if snapshots:
            return snapshots[0].get("price")
    except (KeyError, IndexError):
        return None

Step 3: Get Match Results

The /scores endpoint returns period-by-period scores for completed fixtures.

def get_result(fixture_id):
    """Get match result (fulltime score)."""
    r = requests.get(f"{BASE}/scores",
                     params={"apiKey": API_KEY, "fixtureId": fixture_id},
                     timeout=15)
    r.raise_for_status()
    data = r.json()
    periods = data.get("scores", {}).get("periods", {})
    ft = periods.get("fulltime", periods.get("result", {}))
    return {
        "home_score": ft.get("participant1Score"),
        "away_score": ft.get("participant2Score")
    }

def determine_1x2_outcome(result):
    """Determine 1X2 outcome from score."""
    h, a = result["home_score"], result["away_score"]
    if h is None or a is None:
        return None
    if h > a:
        return "101"  # Home win
    elif h == a:
        return "102"  # Draw
    else:
        return "103"  # Away win

Step 4: Build the Calibration Dataset

Now we loop through completed fixtures, get closing prices from both Polymarket and Pinnacle, and record whether each predicted probability matched reality.

import pandas as pd

def build_calibration_data(fixtures, max_fixtures=200):
    """Build dataset: closing probability vs actual outcome."""
    records = []
    checked = 0

    for fx in fixtures[:max_fixtures]:
        fid = fx["fixtureId"]
        name = f"{fx['participant1Name']} vs {fx['participant2Name']}"

        # Get historical odds
        try:
            hist = get_closing_odds(fid, ["polymarket", "pinnacle"])
        except Exception:
            continue

        # Check if polymarket data exists
        if "polymarket" not in hist.get("bookmakerOdds", {}):
            continue

        # Get result
        try:
            result = get_result(fid)
            actual_outcome = determine_1x2_outcome(result)
        except Exception:
            continue

        if actual_outcome is None:
            continue

        checked += 1

        # Extract closing prices for each outcome
        for oid, label in [("101", "Home"), ("102", "Draw"), ("103", "Away")]:
            poly_price = extract_closing_price(hist, "polymarket", "101", oid)
            pin_price = extract_closing_price(hist, "pinnacle", "101", oid)

            if poly_price and pin_price:
                poly_prob = 1 / poly_price
                pin_prob = 1 / pin_price
                occurred = 1 if oid == actual_outcome else 0

                records.append({
                    "match": name,
                    "outcome": label,
                    "poly_prob": round(poly_prob, 4),
                    "pin_prob": round(pin_prob, 4),
                    "occurred": occurred
                })

    print(f"Checked {checked} fixtures with Polymarket data")
    return pd.DataFrame(records)

df = build_calibration_data(fixtures)
print(f"\n{len(df)} data points collected")
print(df.head(10))

Step 5: Calculate Brier Scores

The Brier score measures prediction accuracy. Lower is better. A perfect predictor scores 0.0, random guessing scores 0.25 on binary outcomes.

import numpy as np

def brier_score(probs, outcomes):
    """Calculate Brier score: mean squared error of predictions."""
    return np.mean((np.array(probs) - np.array(outcomes)) ** 2)

if len(df) > 0:
    poly_brier = brier_score(df["poly_prob"], df["occurred"])
    pin_brier = brier_score(df["pin_prob"], df["occurred"])

    print(f"\nBrier Scores (lower = better):")
    print(f"  Polymarket: {poly_brier:.4f}")
    print(f"  Pinnacle:   {pin_brier:.4f}")

    if poly_brier < pin_brier:
        print(f"\n  Polymarket is better calibrated by {(pin_brier - poly_brier):.4f}")
    else:
        print(f"\n  Pinnacle is better calibrated by {(poly_brier - pin_brier):.4f}")

For context on what these numbers mean:

Brier Score	Interpretation
0.00	Perfect prediction
0.10	Excellent — professional forecaster level
0.15	Good — sharp bookmaker level
0.20	Fair — typical soft bookmaker
0.25	Random guessing (coin flip)

Step 6: Plot the Calibration Curve

A calibration curve shows how well-calibrated a predictor is. On a perfectly calibrated source, events predicted at 70% should happen 70% of the time. The diagonal line represents perfection.

import plotly.graph_objects as go

def calibration_curve(probs, outcomes, n_bins=10):
    """Calculate calibration curve data."""
    bins = np.linspace(0, 1, n_bins + 1)
    bin_centers = []
    bin_actuals = []

    for i in range(n_bins):
        mask = (probs >= bins[i]) & (probs < bins[i + 1])
        if mask.sum() >= 3:  # Need at least 3 events per bin
            bin_centers.append(np.mean(probs[mask]))
            bin_actuals.append(np.mean(outcomes[mask]))

    return bin_centers, bin_actuals

if len(df) > 0:
    poly_x, poly_y = calibration_curve(
        df["poly_prob"].values, df["occurred"].values)
    pin_x, pin_y = calibration_curve(
        df["pin_prob"].values, df["occurred"].values)

    fig = go.Figure()

    # Perfect calibration line
    fig.add_trace(go.Scatter(
        x=[0, 1], y=[0, 1], mode="lines",
        line=dict(dash="dash", color="gray"),
        name="Perfect Calibration"))

    # Polymarket calibration
    fig.add_trace(go.Scatter(
        x=poly_x, y=poly_y, mode="lines+markers",
        name=f"Polymarket (Brier: {poly_brier:.4f})",
        line=dict(color="#6366f1", width=3),
        marker=dict(size=10)))

    # Pinnacle calibration
    fig.add_trace(go.Scatter(
        x=pin_x, y=pin_y, mode="lines+markers",
        name=f"Pinnacle (Brier: {pin_brier:.4f})",
        line=dict(color="#f59e0b", width=3),
        marker=dict(size=10)))

    fig.update_layout(
        title="Calibration Curve: Polymarket vs Pinnacle",
        xaxis_title="Predicted Probability",
        yaxis_title="Actual Frequency",
        template="plotly_dark",
        height=500,
        xaxis=dict(range=[0, 1]),
        yaxis=dict(range=[0, 1]))

    fig.show()  # Opens in browser
    # Or save: fig.write_html("calibration.html")

How to Interpret the Results

When you run this analysis, you'll see one of three things:

Result	What It Means	Implication
Polymarket closer to diagonal	Crowd is better calibrated than sharps	Prediction markets add information beyond what bookmakers capture
Pinnacle closer to diagonal	Sharp bookmaker is better calibrated	Professional pricing beats crowd wisdom on sports events
Both similar	No significant accuracy difference	Use whichever has better liquidity for your use case

Based on early data, Pinnacle tends to be better calibrated on sports events — which makes sense. Sports betting markets have decades of professional pricing infrastructure. Polymarket is newer and trades thinner on sports, which leads to wider spreads and more noise in the prices.

But the gaps between them — those are trading opportunities. If Pinnacle is better calibrated and Polymarket disagrees, the Polymarket price is potentially mispriced. That's exactly what the dashboard and terminal monitor from our other tutorials are designed to catch.

Extending the Analysis

More data: Expand beyond 7 days. Run the script weekly and aggregate results into a growing dataset. With 500+ events, the calibration curve becomes much smoother.
More bookmakers: Compare Kalshi, DraftKings, and Bet365 alongside Polymarket and Pinnacle. The historical-odds endpoint supports 3 bookmakers per call.
Sport-specific analysis: Polymarket might be better calibrated on NBA (simpler binary outcome) vs soccer (three-way market). Test each sport separately.
Time decay: Compare closing odds vs odds 24 hours before the match. Do prediction markets improve faster or slower than sportsbooks as kickoff approaches?

The Complete Backtest Script

Here's everything combined into one runnable script:

import requests, numpy as np, pandas as pd
import plotly.graph_objects as go
from datetime import datetime, timedelta

API_KEY = "YOUR_API_KEY"
BASE = "https://api.oddspapi.io/v4"


def fetch_completed(sport_id=10, days=7):
    end = datetime.utcnow()
    start = end - timedelta(days=days)
    params = {"apiKey": API_KEY, "sportId": sport_id,
              "from": start.strftime("%Y-%m-%dT00:00:00Z"),
              "to": end.strftime("%Y-%m-%dT23:59:59Z"), "limit": 300}
    r = requests.get(f"{BASE}/fixtures", params=params, timeout=15)
    return [f for f in r.json() if f.get("statusId") == 2]


def get_hist(fid, books=["polymarket", "pinnacle"]):
    r = requests.get(f"{BASE}/historical-odds",
                     params={"apiKey": API_KEY, "fixtureId": fid,
                             "bookmakers": ",".join(books)}, timeout=15)
    return r.json()


def get_score(fid):
    r = requests.get(f"{BASE}/scores",
                     params={"apiKey": API_KEY, "fixtureId": fid}, timeout=15)
    p = r.json().get("scores", {}).get("periods", {})
    ft = p.get("fulltime", p.get("result", {}))
    h, a = ft.get("participant1Score"), ft.get("participant2Score")
    if h is None or a is None:
        return None
    return "101" if h > a else ("102" if h == a else "103")


def closing_price(hist, slug, oid):
    try:
        snaps = hist["bookmakerOdds"][slug]["markets"]["101"]["outcomes"][oid]["players"]["0"]
        return snaps[0]["price"] if snaps else None
    except (KeyError, IndexError):
        return None


# Build dataset
fixtures = fetch_completed(days=7)
print(f"Found {len(fixtures)} completed fixtures")

records = []
for fx in fixtures:
    try:
        hist = get_hist(fx["fixtureId"])
    except Exception:
        continue
    if "polymarket" not in hist.get("bookmakerOdds", {}):
        continue
    outcome = get_score(fx["fixtureId"])
    if not outcome:
        continue
    for oid, lbl in [("101", "Home"), ("102", "Draw"), ("103", "Away")]:
        pp = closing_price(hist, "polymarket", oid)
        pn = closing_price(hist, "pinnacle", oid)
        if pp and pn and pp > 0 and pn > 0:
            records.append({"poly": 1/pp, "pin": 1/pn,
                            "hit": 1 if oid == outcome else 0})

df = pd.DataFrame(records)
print(f"Collected {len(df)} data points")

if len(df) == 0:
    print("No data — try increasing days or using a different sport")
    exit()

# Brier scores
poly_brier = np.mean((df["poly"] - df["hit"]) ** 2)
pin_brier = np.mean((df["pin"] - df["hit"]) ** 2)
print(f"\nBrier Scores:")
print(f"  Polymarket: {poly_brier:.4f}")
print(f"  Pinnacle:   {pin_brier:.4f}")
winner = "Polymarket" if poly_brier < pin_brier else "Pinnacle"
print(f"  Winner: {winner}")

# Calibration curve
def cal_curve(probs, hits, bins=10):
    edges = np.linspace(0, 1, bins + 1)
    cx, cy = [], []
    for i in range(bins):
        m = (probs >= edges[i]) & (probs < edges[i+1])
        if m.sum() >= 3:
            cx.append(float(np.mean(probs[m])))
            cy.append(float(np.mean(hits[m])))
    return cx, cy

px, py = cal_curve(df["poly"].values, df["hit"].values)
nx, ny = cal_curve(df["pin"].values, df["hit"].values)

fig = go.Figure()
fig.add_trace(go.Scatter(x=[0,1], y=[0,1], mode="lines",
    line=dict(dash="dash", color="gray"), name="Perfect"))
fig.add_trace(go.Scatter(x=px, y=py, mode="lines+markers",
    name=f"Polymarket ({poly_brier:.4f})",
    line=dict(color="#6366f1", width=3), marker=dict(size=10)))
fig.add_trace(go.Scatter(x=nx, y=ny, mode="lines+markers",
    name=f"Pinnacle ({pin_brier:.4f})",
    line=dict(color="#f59e0b", width=3), marker=dict(size=10)))
fig.update_layout(title="Calibration: Polymarket vs Pinnacle",
    xaxis_title="Predicted Probability",
    yaxis_title="Actual Frequency",
    template="plotly_dark", height=500,
    xaxis=dict(range=[0,1]), yaxis=dict(range=[0,1]))
fig.write_html("calibration.html")
print("\nCalibration chart saved to calibration.html")

Why Free Historical Data Changes Everything

This entire analysis runs on OddsPapi's free tier. No credit card, no enterprise contract, no blockchain indexing. Just an API key and a Python script.

Most competitors charge $79+/month for historical data. OddsPapi gives it away because historical data drives adoption — developers who backtest models end up building live trading systems that need real-time feeds.

The free tier covers 350+ bookmakers including prediction market exchanges (Polymarket, Kalshi, ProphetX) and sharp books (Pinnacle, Singbet, SBOBet). That's the same data hedge funds pay five figures for, accessible to anyone with a Python script.

Run This Analysis Yourself

You just learned how to quantify the question "are prediction markets more accurate than sportsbooks?" with real data. The answer varies by sport, by time period, and by market type — which is exactly why you need to run the analysis yourself, not trust someone else's claims.

Get your free API key and run the backtest. If you want to monitor live divergences instead of backtesting historical ones, check out our prediction market dashboard or CLI terminal monitor.

For more on how Polymarket and Kalshi data works within OddsPapi, see our Polymarket & Kalshi API guide.