Pandas + Odds API: Live Bookmaker Odds to DataFrame in 10 Lines
The OddsPapi /v4/odds response is four levels deep: bookmaker → market → outcome → players[0]. That nesting is fine for a JSON viewer, useless for analysis. This post turns it into a flat pandas DataFrame in 10 lines, then walks through the four recipes that make the DataFrame worth building: best price per outcome, vig per book, price spread per market, and a live arb scanner.
Every number in this post was pulled live from the OddsPapi API on Manchester United vs Liverpool — 122 bookmakers, 18,426 active price rows from a single fixture. If you want the data to follow along, grab a free API key and any soccer fixture ID; the structure is identical across sports.
Why You Want a DataFrame, Not a Dict
Most odds-API tutorials hand you a Python dict and stop there. That’s fine if you only ever check one bookmaker for one market. The moment you want to compare prices, calculate fair odds, find arbs, or backtest, you’ll write 30 lines of nested loops to flatten it — every single time.
The whole point of pandas is that once you have a DataFrame, every analytical question becomes a one-line groupby. So the trick is getting from the raw JSON payload to a clean, long-format DataFrame as fast as possible, then never touching the dict again.
| Workflow | Without pandas | With pandas |
|---|---|---|
| Best price per outcome | Loop over books, dict comparisons, manual max | df.groupby("outcome_id")["price"].max() |
| Vig per bookmaker | Re-shape per book, sum inverse manually | df.groupby("book")["price"].agg(vig) |
| Price spread per market | Min/max passes per market | df.groupby("market_id")["price"].agg(["min","max"]) |
| Find arbs across 100+ books | Triple nested loops | df.groupby("market_id").apply(scan) |
The 10-Line Snippet
Here is the entire JSON-to-DataFrame conversion. Pull odds, comprehend, done.
import requests, pandas as pd
API_KEY = "YOUR_API_KEY"
odds = requests.get("https://api.oddspapi.io/v4/odds", params={
"apiKey": API_KEY,
"fixtureId": "id1000001761301215", # Man Utd vs Liverpool
}).json()
df = pd.DataFrame([
{"book": book, "market_id": int(mid), "outcome_id": int(oid), "price": p["price"]}
for book, b in odds["bookmakerOdds"].items()
for mid, m in b["markets"].items()
for oid, o in m["outcomes"].items()
for p in [o["players"].get("0")] if p and p.get("active") and p["price"] > 0
])
Output:
>>> df.shape
(18426, 4)
>>> df.book.nunique()
119
>>> df.market_id.nunique()
342
>>> df.head()
book market_id outcome_id price
0 3et 10224 10225 4.65
1 3et 10224 10224 1.15
2 3et 10240 10240 1.19
3 3et 10240 10241 4.15
4 3et 10216 10216 1.86
That’s it — one fixture, 18,426 rows of clean tidy data, every bookmaker’s price for every market and outcome on a single row. From here, every analytical question is a one-liner.
Three things the snippet handles defensively
- Player prop markets are skipped. Standard match markets (1X2, totals, BTTS, handicaps) ship a single price under
players["0"]. Player props use the player’s ID as the key (players["1886726"]etc.). Filtering onplayers.get("0")keeps the DataFrame clean for game-level analysis. Add a second pass if you want props. active=Trueis required. Inactive outcomes (suspended, settled, withdrawn) ship with stale or zero prices. Always filter.price > 0is required even whenactive=True. A small number of books (Bet365 is the usual offender) shipactive=Truewithprice=0for markets they don’t price yet. Without this filter, a single zero will tank everymin()andidxmin()in your downstream analysis.
Add Human-Readable Names (Optional but Worth It)
The DataFrame above is enough for math, but market and outcome IDs aren’t memorable. The /v4/markets catalog gives you the lookup tables you need:
cat = requests.get("https://api.oddspapi.io/v4/markets",
params={"apiKey": API_KEY, "sportId": 10}).json()
mkt_name = {m["marketId"]: m["marketName"] for m in cat}
out_name = {(m["marketId"], o["outcomeId"]): o["outcomeName"]
for m in cat for o in m.get("outcomes", [])}
df["market"] = df.market_id.map(mkt_name)
df["outcome"] = df.apply(lambda r: out_name.get((r.market_id, r.outcome_id)), axis=1)
Soccer’s catalog has 32,000+ markets across all handicap and total variants — way too many to hardcode. Build the lookup from the API and your code stays correct as new markets get added.
Recipe 1: Best Price Per Outcome
The most common question for any sharp bettor: which book has the best price for each outcome on this market? One line:
m101 = df[df.market_id == 101] # 1X2 / Full Time Result
best = m101.loc[m101.groupby("outcome_id")["price"].idxmax()]
print(best[["outcome", "book", "price"]])
Output (Man Utd vs Liverpool, live data captured for this post):
outcome book price
1 betfair-ex 2.50
X kalshi 4.00
2 hardrockbet 3.00
Three different venues, three best prices: a UK exchange has the best Man Utd line, a US prediction market is paying the highest draw odds, and a US sportsbook tops the Liverpool side. If you placed your full stake at a single book, you’d be giving away vig on two of three legs every single time. This is the foundation of line shopping — and pandas turns it into a one-liner.
Recipe 2: Vig Per Bookmaker
Vig (the bookmaker’s built-in margin) tells you how aggressive a book is pricing. Lower vig = sharper book = better long-term value. The math: sum the inverse decimal odds across a market’s outcomes; the excess over 1.0 is the vig.
def vig(prices):
return ((1 / prices).sum() - 1) * 100 if len(prices) == 3 else None
vig_per_book = (
m101.groupby("book")["price"]
.agg(vig)
.dropna()
.sort_values()
)
print(vig_per_book.head(10))
print("...")
print(vig_per_book.tail(5))
Live output (Man Utd vs Liverpool 1X2, sorted by vig):
book
kalshi 0.00 <- prediction market, near-zero by design
betfair-ex 0.46 <- exchange, 1-2% commission instead of vig
1xbet 1.54
polymarket 2.00
admiralbet.rs 2.02
sportybet 2.14
pinnacle 2.70 <- the sharp benchmark
unibet.nl 2.35
...
lottoland 9.02
sesamesport.bg 9.26
virginbet 10.89
bwin.fr 14.38
winamax.fr 19.14 <- 19% margin on a Premier League marquee
The spread is staggering: Pinnacle prices the same market at 2.7% margin, Winamax France at 19%. If you only ever check one book, this number is the silent tax you’re paying. The whole reason OddsPapi exists is so you don’t have to.
Note that exchanges and prediction markets show near-zero vig — they monetize through commission on settled bets instead. Filter them out (vig_per_book[vig_per_book > 0.5]) for an apples-to-apples sportsbook comparison.
Recipe 3: Price Spread Per Market
“Where are the books most disagreed?” One groupby:
spread = (
df.groupby("market_id")["price"]
.agg(["min", "max", "count"])
)
spread["spread_pct"] = (spread["max"] - spread["min"]) / spread["min"] * 100
spread["market"] = spread.index.map(mkt_name)
print(spread[spread["count"] >= 30].sort_values("spread_pct", ascending=False).head(10))
The count >= 30 filter strips out illiquid markets where 2 books quote wildly different prices because nobody’s pricing them seriously. What you’re left with is a ranked list of markets where prices genuinely diverge across well-quoted books — the hunting ground for value bets.
For Man Utd vs Liverpool, the highest-spread markets are correct-score variants (one book at 6.0, another at 1,250.0 for the same scoreline) — expected, because correct-score is structurally hard to model. The interesting ones are 1X2, Asian Handicaps, and totals where the spread should be tight but isn’t.
Recipe 4: Live Arb Scanner
An arbitrage opportunity exists when the inverse sum of best prices across all outcomes is less than 1.0. With pandas, you can scan every 3-way market on a fixture in a single pass:
three_way_markets = df.groupby("market_id").outcome_id.nunique()
three_way_markets = three_way_markets[three_way_markets == 3].index
arbs = []
for mid in three_way_markets:
sub = df[df.market_id == mid]
best_per_outcome = sub.groupby("outcome_id")["price"].max()
if len(best_per_outcome) == 3:
invsum = (1 / best_per_outcome).sum()
if invsum < 1:
arbs.append({
"market_id": mid,
"market": mkt_name.get(mid),
"invsum": invsum,
"margin_pct": (1 - invsum) * 100,
})
print(pd.DataFrame(arbs).sort_values("margin_pct", ascending=False))
Live output:
market_id market invsum margin_pct
10140 European Handicap (-1) 0.5955 40.45 <- DATA ERROR, ignore
102050 To Win From Behind FT 0.8918 10.82 <- stale, illiquid
10211 Second Half Result 0.9579 4.21 <- plausible, verify
101 Full Time Result 1X2 0.9833 1.67 <- real, tight margin
Why your arb scanner needs sanity checks
The headline result (40% margin on European Handicap -1) is almost certainly a stale price or a data error from a low-quality book — not a real arb. The biggest “arbs” are usually the ones that aren’t real. Before you ever stake on a scanner output, layer in:
- Margin ceiling. Real soft/sharp arbs sit at 0.5–3%. Anything above 5% on a top-tier market is a stale-price flag, not a green light. Cap your scanner at
margin_pct < 5. - Book whitelist. Restrict best-price selection to books you actually have funded accounts at and that pay out reliably. The 40% “arb” disappears the moment you exclude one bookmaker. Pre-filter the DataFrame:
df = df[df.book.isin(["pinnacle", "bet365", "draftkings", ...])]. - Liquidity floor. Require N books quoted on each outcome before trusting it as the “true” best.
sub.groupby("outcome_id").size().min() >= 5. - Re-poll before stake. Lines move. Even a 5-second-old arb may have closed. The OddsPapi response includes a
changedAttimestamp on every outcome — use it.
For a complete arb implementation with these guardrails, see our arbitrage betting bot tutorial.
From One Fixture to a Live Slate
Everything above used a single fixture. To run the same recipes across an entire matchday, fetch fixtures, then map the DataFrame builder over each one:
from datetime import datetime, timedelta, timezone
import time
now = datetime.now(timezone.utc)
end = now + timedelta(days=1)
fixtures = requests.get("https://api.oddspapi.io/v4/fixtures", params={
"apiKey": API_KEY,
"sportId": 10,
"from": now.strftime("%Y-%m-%dT%H:%M:%S"),
"to": end.strftime("%Y-%m-%dT%H:%M:%S"),
}).json()
frames = []
for f in fixtures:
if not f.get("hasOdds"):
continue
o = requests.get("https://api.oddspapi.io/v4/odds", params={
"apiKey": API_KEY, "fixtureId": f["fixtureId"]
}).json()
rows = [
{
"fixture_id": f["fixtureId"],
"match": f"{f['participant1Name']} vs {f['participant2Name']}",
"book": book, "market_id": int(mid),
"outcome_id": int(oid), "price": p["price"],
}
for book, b in o.get("bookmakerOdds", {}).items()
for mid, m in b["markets"].items()
for oid, oc in m["outcomes"].items()
for p in [oc["players"].get("0")]
if p and p.get("active") and p["price"] > 0
]
frames.append(pd.DataFrame(rows))
time.sleep(0.2) # respect rate limit
slate = pd.concat(frames, ignore_index=True)
print(f"Slate: {len(slate):,} rows across {slate.fixture_id.nunique()} fixtures")
The free tier has a ~0.88s cooldown between calls to the same endpoint, so the time.sleep(0.2) keeps you safely inside it. A 50-fixture Premier League / La Liga / Serie A slate produces around 800,000–1,000,000 rows — comfortably in pandas’s wheelhouse on a laptop.
Why OddsPapi Pairs Well With Pandas
| Capability | Why it matters for pandas workflows |
|---|---|
| 350+ bookmakers in one response | You don’t need to merge feeds. One /v4/odds call gives you every book in a single nested dict, ready to flatten. |
| Free historical data | Same DataFrame shape, but with a price-history list per outcome instead of a single value. Backtest models without paying for snapshots. |
| Stable ID-based schema | Market and outcome IDs don’t move. You can persist DataFrames across days/weeks without breaking joins. |
| Native exchange & prediction-market data | Polymarket, Kalshi, Betfair Exchange, Matchbook all flatten into the same DataFrame. Exchange-specific data (lay prices) lives under exchangeMeta. |
Common Pitfalls
1. Don’t iterate iterrows() over a 100K-row DataFrame
It’s 200x slower than vectorised pandas operations. If you’re tempted to for row in df.iterrows() to compute something, stop and reach for groupby, apply, or merge first.
2. Don’t trust the headline arb
Covered above — biggest “arb” is almost always a data quality issue. Filter aggressively.
3. Don’t forget timezones on historical data
The historical endpoint ships ISO timestamps in UTC ("2026-04-05T09:38:17.611441+00:00"). Parse with pd.to_datetime(df["createdAt"], utc=True) and you avoid the daylight-savings traps that bite people running Python scripts on local time.
4. Player props use a different DataFrame
The 10-line snippet above filters to players["0"]. Player prop markets have one row per player (key is the player ID). For props, build a parallel DataFrame:
props = pd.DataFrame([
{"book": book, "market_id": int(mid), "outcome_id": int(oid),
"player_id": pid, "player_name": p.get("playerName"), "price": p["price"]}
for book, b in odds["bookmakerOdds"].items()
for mid, m in b["markets"].items()
for oid, o in m["outcomes"].items()
for pid, p in o["players"].items()
if pid != "0" and p.get("active") and p["price"] > 0
])
For a full prop walkthrough, see our player props API tutorial.
5. Historical odds need a list-aware flattener
The /v4/historical-odds endpoint ships players["0"] as a list of price snapshots, not a single dict. Adjust the comprehension:
hist = requests.get("https://api.oddspapi.io/v4/historical-odds", params={
"apiKey": API_KEY,
"fixtureId": "id1000001761301215",
"bookmakers": "pinnacle,bet365,singbet", # max 3 per call
}).json()
hist_df = pd.DataFrame([
{"book": book, "market_id": int(mid), "outcome_id": int(oid),
"price": snap["price"], "ts": snap["createdAt"]}
for book, b in hist["bookmakers"].items()
for mid, m in b["markets"].items()
for oid, o in m["outcomes"].items()
for snap in o["players"]["0"]
if snap.get("active") and snap["price"] > 0
])
hist_df["ts"] = pd.to_datetime(hist_df["ts"], utc=True)
For a full historical workflow including CSV export, see Historical Odds CSV Export.
What to Build With This DataFrame
- Value betting scanner — Compare each book’s price against a fair-odds benchmark (Pinnacle no-vig works well) and flag positive-EV opportunities.
- Consensus odds calculator — Average the inverse prices across a quality-filtered set of books to get a “true” probability per outcome.
- Kelly criterion stake sizing — Combined with fair-odds, compute optimal stakes on every +EV bet your scanner finds.
- Live odds dashboard — Same DataFrame, visualised in Streamlit + Plotly with auto-refresh.
- Steam move detector — Use the
changedAttimestamps + price history to detect when sharp money moves the line.
FAQ
Why use pandas instead of just dicts and loops?
Performance and readability. A 1M-row DataFrame answers “best price per outcome across every fixture today” in 50ms via groupby. The equivalent nested-loop version is 30 lines, takes 2+ seconds, and breaks the moment a new market type appears.
How big can the DataFrame get before pandas struggles?
A full Saturday Premier League / La Liga / Serie A slate produces around 1M rows — comfortable on any laptop. Past 10M rows, switch to Polars (drop-in API, 5–10x faster) or chunk by sport/league and persist to Parquet.
Why is players["0"] the convention?
Standard match markets (1X2, BTTS, totals, handicaps) involve “the team” rather than a specific player, so OddsPapi files them under the synthetic "0" key. Player prop markets use the actual player’s ID. Filtering on "0" cleanly separates game-level markets from props in your DataFrame.
Why does active=True sometimes ship with price=0?
It’s an upstream feed quirk on a small number of bookmakers (Bet365 is the most frequent). The book’s UI hasn’t priced the market yet but their API endpoint returns the slot anyway. Always filter price > 0 in addition to active.
Can I store the DataFrame to disk between runs?
Yes — Parquet is the right format. df.to_parquet("odds.parquet") is faster than CSV and preserves dtypes. For long-running collection (steam detection, historical capture), append to a partitioned dataset: df.to_parquet(f"odds/{fixture_id}/{ts}.parquet").
What about Polars instead of pandas?
Polars uses the same flatten-then-analyse pattern. The 10-line snippet works as-is — change pd.DataFrame(...) to pl.DataFrame(...) and your groupby/agg calls are nearly identical. For datasets above 5M rows, Polars is a clear win. Below that, pandas is fine and has the larger ecosystem.
Get Your Free API Key
Stop scraping. The full pandas workflow above runs on the free tier — 350+ bookmakers, every sport, free historical data included. Grab your API key and pull your first DataFrame in under five minutes.