Home/Blog/Guide

Matching games across bookmakers in Python

How to reliably tell when two bookmakers are pricing the same game, including doubleheaders and books that list different start times.

By Joe, Founder of RapidOddsAPI · Guide · Updated June 2026

Every cross book tool, arbitrage, middles, positive EV, has to first decide which entries from different bookmakers are the same game. Team names alone are not enough, because the same two teams can play twice in a day, and books often list slightly different start times for one game. The reliable approach is to match on team names plus a time window, comparing start times as hours since the epoch so games never get mixed up. This guide builds that matching in Python.

In the other guides we matched games by team name, which is fine for a single round of fixtures. This is the robust version, for when that is not enough. Two problems break naive matching, and a third trap catches people who try to fix them the obvious way.

Why team names are not enough

Doubleheaders. In baseball especially, the same two teams can play twice on the same day. Match on team names alone and you merge two different games into one, then compare prices that belong to different fixtures.
Different listed times. Books rarely agree on the start time to the minute. One has a game at 17:05, another at 17:11. Match on team names plus exact time and you split one game into two, then never see that the books disagree on a price.

So you need something in between: same teams, and start times close enough to be the same game, but far enough apart to tell two games of a doubleheader apart.

The trap: bucketing by time of day

The tempting fix is to round each game into a slot, say morning, afternoon, evening, or a fixed hourly bucket, and match within the slot. It seems to handle doubleheaders, but it quietly fails at the edges. A game at 16:58 and the same game listed at 17:02 fall into different hourly buckets and get split. Two different games at 19:05 and 19:50 fall into the same evening slot and get merged. Fixed slots draw hard lines in the wrong places.

The safe approach does not slot anything. It measures the actual distance between two start times and asks a single question: are these close enough to be the same game? To make that comparison clean, convert each start time to one number.

Start times as hours since the epoch

Turn each ISO start time into hours since the epoch, a single number counting from a fixed point in time. Once both times are plain numbers on the same scale, the gap between two games is just a subtraction, with no timezone or calendar maths to get wrong.

import datetime

def hours_since_epoch(commence_time):
    t = commence_time.replace("Z", "")
    dt = datetime.datetime.fromisoformat(t)
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=datetime.timezone.utc)
    return dt.timestamp() / 3600

The key point: this is safe across timezones and across midnight. Two books in different regions, or a game that starts late at night, all reduce to the same number line. A four minute disagreement is 0.067 hours apart however you slice it, and a doubleheader is hours apart, so the two cases never collide. Bucketing by time of day cannot promise that, the epoch number can.

Matching with a time window

Now keep a small registry: for each pair of teams, the start times you have already seen. For a new entry, find the closest start time already registered for those teams. If it is within a threshold, it is the same game. If not, it is a separate game, so register it as a new instance.

THRESHOLD_HOURS = 4.0  # within this, same game; beyond it, a different game

registry = {}  # "home_away" -> {instance_id: hours_since_epoch}


def game_key(home, away, commence_time):
    base = f"{home}_{away}"
    hours = hours_since_epoch(commence_time)

    if base not in registry:
        registry[base] = {1: hours}
        return f"{base}_1"

    # Closest start time we have already seen for these teams.
    closest_id, closest_diff = None, float("inf")
    for instance_id, registered_hours in registry[base].items():
        diff = abs(hours - registered_hours)
        if diff < closest_diff:
            closest_id, closest_diff = instance_id, diff

    if closest_diff <= THRESHOLD_HOURS:
        return f"{base}_{closest_id}"   # same game, different listed time

    # Too far apart, so this is a separate game (for example a doubleheader).
    new_id = len(registry[base]) + 1
    registry[base][new_id] = hours
    return f"{base}_{new_id}"

The threshold is the one knob to tune. A few hours works well: comfortably wider than any disagreement between books on the same game, comfortably narrower than the gap between two games of a doubleheader. Tune it per sport if you need to.

It in action

Feed it the same fixture from two books a few minutes apart, then the second game of a doubleheader hours later, and you can see it do the right thing both times.

game_key("Yankees", "Red Sox", "2026-07-04T17:05:00Z")  ->  Yankees_Red Sox_1
game_key("Yankees", "Red Sox", "2026-07-04T17:11:00Z")  ->  Yankees_Red Sox_1
game_key("Yankees", "Red Sox", "2026-07-04T21:35:00Z")  ->  Yankees_Red Sox_2
game_key("Yankees", "Red Sox", "2026-07-04T21:38:00Z")  ->  Yankees_Red Sox_2

The first two are six minutes apart, so they share key Yankees_Red Sox_1, one game seen at two books. The next two are the second game of the doubleheader, hours later, so they get their own key Yankees_Red Sox_2, kept cleanly apart from the first.

Using it to group the feed

Drop game_key in wherever the other guides used a plain (home, away) tuple. Each per book entry gets a stable key, and entries that share a key are the same game.

games = {}
for entry in resp.json()["games"]:
    g = entry["game"]
    key = game_key(g["home_team"], g["away_team"], g["commence_time"])
    if key not in games:
        games[key] = {"game": g, "bookmakers": []}
    games[key]["bookmakers"].extend(entry["bookmakers"])

That is the same grouping every cross book tool starts with, just keyed on the matched game instead of raw team names. From here the arbitrage, middles, EV, and odds screen logic all work as written.

Notes

Names are already standardised. RapidOddsAPI normalises team names across books, so you do not also have to reconcile spellings. Matching is just teams plus time.
Tune the threshold per sport. Sports with tight turnarounds may want a smaller window, sports that never play twice in a day can use a generous one.
Persist the registry if you poll. If you scan repeatedly, keep the registry between runs so a game keeps the same key each time you see it.

Next steps

With reliable matching in place, plug it into any of the cross book tools: the arbitrage scanner, the middles scanner, the positive EV scanner, or the odds comparison screen. For the full picture, see what you can build with an odds API, or read the API documentation.

Start building with RapidOddsAPI

Real-time, standardised odds from 100+ bookmakers over REST and WebSocket. Start free with 250 credits, no credit card required.

Get Your Free API Key Read the Docs