Pandas + Odds API: Live Bookmaker Odds to DataFrame in 10 Lines

Q: Why use pandas instead of just dicts and loops for odds data?

Performance and readability. A 1M-row DataFrame answers 'best price per outcome across every fixture today' in 50ms via groupby. The equivalent nested-loop version is 30 lines, takes 2+ seconds, and breaks when a new market type appears.

Q: Can I store the OddsPapi DataFrame to disk between runs?

Yes — Parquet is the right format. df.to_parquet('odds.parquet') is faster than CSV and preserves dtypes. For long-running collection, append to a partitioned dataset: df.to_parquet(f'odds/{fixture_id}/{ts}.parquet').

How To Guides May 20, 2026

The OddsPapi /v4/odds response is four levels deep: bookmaker → market → outcome → players[0]. That nesting is fine for a JSON viewer, useless for analysis. This post turns it into a flat pandas DataFrame in 10 lines, then walks through the four recipes that make the DataFrame worth building: best price per outcome, vig per book, price spread per market, and a live arb scanner.

Every number in this post was pulled live from the OddsPapi API on Manchester United vs Liverpool — 122 bookmakers, 18,426 active price rows from a single fixture. If you want the data to follow along, grab a free API key and any soccer fixture ID; the structure is identical across sports.

Why You Want a DataFrame, Not a Dict

Most odds-API tutorials hand you a Python dict and stop there. That’s fine if you only ever check one bookmaker for one market. The moment you want to compare prices, calculate fair odds, find arbs, or backtest, you’ll write 30 lines of nested loops to flatten it — every single time.

The whole point of pandas is that once you have a DataFrame, every analytical question becomes a one-line groupby. So the trick is getting from the raw JSON payload to a clean, long-format DataFrame as fast as possible, then never touching the dict again.

Workflow	Without pandas	With pandas
Best price per outcome	Loop over books, dict comparisons, manual max	`df.groupby("outcome_id")["price"].max()`
Vig per bookmaker	Re-shape per book, sum inverse manually	`df.groupby("book")["price"].agg(vig)`
Price spread per market	Min/max passes per market	`df.groupby("market_id")["price"].agg(["min","max"])`
Find arbs across 100+ books	Triple nested loops	`df.groupby("market_id").apply(scan)`

The 10-Line Snippet

Here is the entire JSON-to-DataFrame conversion. Pull odds, comprehend, done.

import requests, pandas as pd

API_KEY = "YOUR_API_KEY"
odds = requests.get("https://api.oddspapi.io/v4/odds", params={
    "apiKey": API_KEY,
    "fixtureId": "id1000001761301215",  # Man Utd vs Liverpool
}).json()

df = pd.DataFrame([
    {"book": book, "market_id": int(mid), "outcome_id": int(oid), "price": p["price"]}
    for book, b in odds["bookmakerOdds"].items()
    for mid, m in b["markets"].items()
    for oid, o in m["outcomes"].items()
    for p in [o["players"].get("0")] if p and p.get("active") and p["price"] > 0
])

Output:

>>> df.shape
(18426, 4)
>>> df.book.nunique()
119
>>> df.market_id.nunique()
342
>>> df.head()
     book  market_id  outcome_id  price
0     3et      10224       10225   4.65
1     3et      10224       10224   1.15
2     3et      10240       10240   1.19
3     3et      10240       10241   4.15
4     3et      10216       10216   1.86

That’s it — one fixture, 18,426 rows of clean tidy data, every bookmaker’s price for every market and outcome on a single row. From here, every analytical question is a one-liner.

Three things the snippet handles defensively

Player prop markets are skipped. Standard match markets (1X2, totals, BTTS, handicaps) ship a single price under players["0"]. Player props use the player’s ID as the key (players["1886726"] etc.). Filtering on players.get("0") keeps the DataFrame clean for game-level analysis. Add a second pass if you want props.
active=True is required. Inactive outcomes (suspended, settled, withdrawn) ship with stale or zero prices. Always filter.
price > 0 is required even when active=True. A small number of books (Bet365 is the usual offender) ship active=True with price=0 for markets they don’t price yet. Without this filter, a single zero will tank every min() and idxmin() in your downstream analysis.

Add Human-Readable Names (Optional but Worth It)

The DataFrame above is enough for math, but market and outcome IDs aren’t memorable. The /v4/markets catalog gives you the lookup tables you need:

cat = requests.get("https://api.oddspapi.io/v4/markets",
    params={"apiKey": API_KEY, "sportId": 10}).json()

mkt_name = {m["marketId"]: m["marketName"] for m in cat}
out_name = {(m["marketId"], o["outcomeId"]): o["outcomeName"]
            for m in cat for o in m.get("outcomes", [])}

df["market"] = df.market_id.map(mkt_name)
df["outcome"] = df.apply(lambda r: out_name.get((r.market_id, r.outcome_id)), axis=1)

Soccer’s catalog has 32,000+ markets across all handicap and total variants — way too many to hardcode. Build the lookup from the API and your code stays correct as new markets get added.

Recipe 1: Best Price Per Outcome

The most common question for any sharp bettor: which book has the best price for each outcome on this market? One line:

m101 = df[df.market_id == 101]  # 1X2 / Full Time Result
best = m101.loc[m101.groupby("outcome_id")["price"].idxmax()]
print(best[["outcome", "book", "price"]])

Output (Man Utd vs Liverpool, live data captured for this post):

outcome        book  price
      1  betfair-ex    2.50
      X      kalshi    4.00
      2 hardrockbet    3.00

Three different venues, three best prices: a UK exchange has the best Man Utd line, a US prediction market is paying the highest draw odds, and a US sportsbook tops the Liverpool side. If you placed your full stake at a single book, you’d be giving away vig on two of three legs every single time. This is the foundation of line shopping — and pandas turns it into a one-liner.

Recipe 2: Vig Per Bookmaker

Vig (the bookmaker’s built-in margin) tells you how aggressive a book is pricing. Lower vig = sharper book = better long-term value. The math: sum the inverse decimal odds across a market’s outcomes; the excess over 1.0 is the vig.

def vig(prices):
    return ((1 / prices).sum() - 1) * 100 if len(prices) == 3 else None

vig_per_book = (
    m101.groupby("book")["price"]
        .agg(vig)
        .dropna()
        .sort_values()
)
print(vig_per_book.head(10))
print("...")
print(vig_per_book.tail(5))

Live output (Man Utd vs Liverpool 1X2, sorted by vig):

book
kalshi           0.00      <- prediction market, near-zero by design
betfair-ex       0.46      <- exchange, 1-2% commission instead of vig
1xbet            1.54
polymarket       2.00
admiralbet.rs    2.02
sportybet        2.14
pinnacle         2.70      <- the sharp benchmark
unibet.nl        2.35
...
lottoland        9.02
sesamesport.bg   9.26
virginbet       10.89
bwin.fr         14.38
winamax.fr      19.14      <- 19% margin on a Premier League marquee

The spread is staggering: Pinnacle prices the same market at 2.7% margin, Winamax France at 19%. If you only ever check one book, this number is the silent tax you’re paying. The whole reason OddsPapi exists is so you don’t have to.

Note that exchanges and prediction markets show near-zero vig — they monetize through commission on settled bets instead. Filter them out (vig_per_book[vig_per_book > 0.5]) for an apples-to-apples sportsbook comparison.

Recipe 3: Price Spread Per Market

“Where are the books most disagreed?” One groupby:

spread = (
    df.groupby("market_id")["price"]
      .agg(["min", "max", "count"])
)
spread["spread_pct"] = (spread["max"] - spread["min"]) / spread["min"] * 100
spread["market"] = spread.index.map(mkt_name)
print(spread[spread["count"] >= 30].sort_values("spread_pct", ascending=False).head(10))

The count >= 30 filter strips out illiquid markets where 2 books quote wildly different prices because nobody’s pricing them seriously. What you’re left with is a ranked list of markets where prices genuinely diverge across well-quoted books — the hunting ground for value bets.

For Man Utd vs Liverpool, the highest-spread markets are correct-score variants (one book at 6.0, another at 1,250.0 for the same scoreline) — expected, because correct-score is structurally hard to model. The interesting ones are 1X2, Asian Handicaps, and totals where the spread should be tight but isn’t.

Recipe 4: Live Arb Scanner

An arbitrage opportunity exists when the inverse sum of best prices across all outcomes is less than 1.0. With pandas, you can scan every 3-way market on a fixture in a single pass:

three_way_markets = df.groupby("market_id").outcome_id.nunique()
three_way_markets = three_way_markets[three_way_markets == 3].index

arbs = []
for mid in three_way_markets:
    sub = df[df.market_id == mid]
    best_per_outcome = sub.groupby("outcome_id")["price"].max()
    if len(best_per_outcome) == 3:
        invsum = (1 / best_per_outcome).sum()
        if invsum < 1:
            arbs.append({
                "market_id": mid,
                "market": mkt_name.get(mid),
                "invsum": invsum,
                "margin_pct": (1 - invsum) * 100,
            })

print(pd.DataFrame(arbs).sort_values("margin_pct", ascending=False))

Live output:

   market_id                       market    invsum  margin_pct
       10140       European Handicap (-1)    0.5955       40.45    <- DATA ERROR, ignore
      102050      To Win From Behind FT     0.8918       10.82    <- stale, illiquid
       10211           Second Half Result    0.9579        4.21    <- plausible, verify
         101         Full Time Result 1X2    0.9833        1.67    <- real, tight margin

Why your arb scanner needs sanity checks

The headline result (40% margin on European Handicap -1) is almost certainly a stale price or a data error from a low-quality book — not a real arb. The biggest “arbs” are usually the ones that aren’t real. Before you ever stake on a scanner output, layer in:

Margin ceiling. Real soft/sharp arbs sit at 0.5–3%. Anything above 5% on a top-tier market is a stale-price flag, not a green light. Cap your scanner at margin_pct < 5.
Book whitelist. Restrict best-price selection to books you actually have funded accounts at and that pay out reliably. The 40% “arb” disappears the moment you exclude one bookmaker. Pre-filter the DataFrame: df = df[df.book.isin(["pinnacle", "bet365", "draftkings", ...])].
Liquidity floor. Require N books quoted on each outcome before trusting it as the “true” best. sub.groupby("outcome_id").size().min() >= 5.
Re-poll before stake. Lines move. Even a 5-second-old arb may have closed. The OddsPapi response includes a changedAt timestamp on every outcome — use it.

For a complete arb implementation with these guardrails, see our arbitrage betting bot tutorial.

From One Fixture to a Live Slate

Everything above used a single fixture. To run the same recipes across an entire matchday, fetch fixtures, then map the DataFrame builder over each one:

from datetime import datetime, timedelta, timezone
import time

now = datetime.now(timezone.utc)
end = now + timedelta(days=1)

fixtures = requests.get("https://api.oddspapi.io/v4/fixtures", params={
    "apiKey": API_KEY,
    "sportId": 10,
    "from": now.strftime("%Y-%m-%dT%H:%M:%S"),
    "to": end.strftime("%Y-%m-%dT%H:%M:%S"),
}).json()

frames = []
for f in fixtures:
    if not f.get("hasOdds"):
        continue
    o = requests.get("https://api.oddspapi.io/v4/odds", params={
        "apiKey": API_KEY, "fixtureId": f["fixtureId"]
    }).json()
    rows = [
        {
            "fixture_id": f["fixtureId"],
            "match": f"{f['participant1Name']} vs {f['participant2Name']}",
            "book": book, "market_id": int(mid),
            "outcome_id": int(oid), "price": p["price"],
        }
        for book, b in o.get("bookmakerOdds", {}).items()
        for mid, m in b["markets"].items()
        for oid, oc in m["outcomes"].items()
        for p in [oc["players"].get("0")]
        if p and p.get("active") and p["price"] > 0
    ]
    frames.append(pd.DataFrame(rows))
    time.sleep(0.2)  # respect rate limit

slate = pd.concat(frames, ignore_index=True)
print(f"Slate: {len(slate):,} rows across {slate.fixture_id.nunique()} fixtures")

The free tier has a ~0.88s cooldown between calls to the same endpoint, so the time.sleep(0.2) keeps you safely inside it. A 50-fixture Premier League / La Liga / Serie A slate produces around 800,000–1,000,000 rows — comfortably in pandas’s wheelhouse on a laptop.

Why OddsPapi Pairs Well With Pandas

Capability	Why it matters for pandas workflows
350+ bookmakers in one response	You don’t need to merge feeds. One `/v4/odds` call gives you every book in a single nested dict, ready to flatten.
Free historical data	Same DataFrame shape, but with a price-history list per outcome instead of a single value. Backtest models without paying for snapshots.
Stable ID-based schema	Market and outcome IDs don’t move. You can persist DataFrames across days/weeks without breaking joins.
Native exchange & prediction-market data	Polymarket, Kalshi, Betfair Exchange, Matchbook all flatten into the same DataFrame. Exchange-specific data (lay prices) lives under `exchangeMeta`.

Common Pitfalls

1. Don’t iterate `iterrows()` over a 100K-row DataFrame

It’s 200x slower than vectorised pandas operations. If you’re tempted to for row in df.iterrows() to compute something, stop and reach for groupby, apply, or merge first.

2. Don’t trust the headline arb

Covered above — biggest “arb” is almost always a data quality issue. Filter aggressively.

3. Don’t forget timezones on historical data

The historical endpoint ships ISO timestamps in UTC ("2026-04-05T09:38:17.611441+00:00"). Parse with pd.to_datetime(df["createdAt"], utc=True) and you avoid the daylight-savings traps that bite people running Python scripts on local time.

4. Player props use a different DataFrame

The 10-line snippet above filters to players["0"]. Player prop markets have one row per player (key is the player ID). For props, build a parallel DataFrame:

props = pd.DataFrame([
    {"book": book, "market_id": int(mid), "outcome_id": int(oid),
     "player_id": pid, "player_name": p.get("playerName"), "price": p["price"]}
    for book, b in odds["bookmakerOdds"].items()
    for mid, m in b["markets"].items()
    for oid, o in m["outcomes"].items()
    for pid, p in o["players"].items()
    if pid != "0" and p.get("active") and p["price"] > 0
])

For a full prop walkthrough, see our player props API tutorial.

5. Historical odds need a list-aware flattener

The /v4/historical-odds endpoint ships players["0"] as a list of price snapshots, not a single dict. Adjust the comprehension:

hist = requests.get("https://api.oddspapi.io/v4/historical-odds", params={
    "apiKey": API_KEY,
    "fixtureId": "id1000001761301215",
    "bookmakers": "pinnacle,bet365,singbet",  # max 3 per call
}).json()

hist_df = pd.DataFrame([
    {"book": book, "market_id": int(mid), "outcome_id": int(oid),
     "price": snap["price"], "ts": snap["createdAt"]}
    for book, b in hist["bookmakers"].items()
    for mid, m in b["markets"].items()
    for oid, o in m["outcomes"].items()
    for snap in o["players"]["0"]
    if snap.get("active") and snap["price"] > 0
])
hist_df["ts"] = pd.to_datetime(hist_df["ts"], utc=True)

For a full historical workflow including CSV export, see Historical Odds CSV Export.

What to Build With This DataFrame

Value betting scanner — Compare each book’s price against a fair-odds benchmark (Pinnacle no-vig works well) and flag positive-EV opportunities.
Consensus odds calculator — Average the inverse prices across a quality-filtered set of books to get a “true” probability per outcome.
Kelly criterion stake sizing — Combined with fair-odds, compute optimal stakes on every +EV bet your scanner finds.
Live odds dashboard — Same DataFrame, visualised in Streamlit + Plotly with auto-refresh.
Steam move detector — Use the changedAt timestamps + price history to detect when sharp money moves the line.

FAQ

Why use pandas instead of just dicts and loops?

Performance and readability. A 1M-row DataFrame answers “best price per outcome across every fixture today” in 50ms via groupby. The equivalent nested-loop version is 30 lines, takes 2+ seconds, and breaks the moment a new market type appears.

How big can the DataFrame get before pandas struggles?

A full Saturday Premier League / La Liga / Serie A slate produces around 1M rows — comfortable on any laptop. Past 10M rows, switch to Polars (drop-in API, 5–10x faster) or chunk by sport/league and persist to Parquet.

Why is `players["0"]` the convention?

Standard match markets (1X2, BTTS, totals, handicaps) involve “the team” rather than a specific player, so OddsPapi files them under the synthetic "0" key. Player prop markets use the actual player’s ID. Filtering on "0" cleanly separates game-level markets from props in your DataFrame.

Why does `active=True` sometimes ship with `price=0`?

It’s an upstream feed quirk on a small number of bookmakers (Bet365 is the most frequent). The book’s UI hasn’t priced the market yet but their API endpoint returns the slot anyway. Always filter price > 0 in addition to active.

Can I store the DataFrame to disk between runs?

Yes — Parquet is the right format. df.to_parquet("odds.parquet") is faster than CSV and preserves dtypes. For long-running collection (steam detection, historical capture), append to a partitioned dataset: df.to_parquet(f"odds/{fixture_id}/{ts}.parquet").

What about Polars instead of pandas?

Polars uses the same flatten-then-analyse pattern. The 10-line snippet works as-is — change pd.DataFrame(...) to pl.DataFrame(...) and your groupby/agg calls are nearly identical. For datasets above 5M rows, Polars is a clear win. Below that, pandas is fine and has the larger ecosystem.

Get Your Free API Key

Stop scraping. The full pandas workflow above runs on the free tier — 350+ bookmakers, every sport, free historical data included. Grab your API key and pull your first DataFrame in under five minutes.