What is options flow historical data?

Options flow historical data is a timestamped record of past unusual options activity — including the ticker, contract type, strike, expiry, premium, Vol/OI ratio, aggressor side, and execution type — for sessions that have already closed. Unlike a live feed, historical data lets you replay past sessions, build backtests, evaluate signal quality, and compare patterns across different market regimes.

How far back does options flow historical data go?

It depends entirely on the provider. Some flow tools only store 30–90 days. More comprehensive databases go back 2–5 years. Full OPRA tape archives (all options prints, not just unusual ones) go back much further but are extremely large and expensive. For backtesting unusual flow signals specifically, 12–24 months is typically enough to capture multiple volatility regimes and earnings cycles.

How do you backtest options flow signals?

A standard backtest: (1) filter historical prints by your scoring criteria (score tier, premium floor, sweep-only, etc.), (2) record the underlying spot price at signal time, (3) record the spot price at your forward horizon (1 day, 3 days, 5 days), (4) classify each outcome as a directional hit (underlying moved the implied direction) or miss, (5) aggregate hit-rate, average return, and drawdown by score tier, option type, and sector. The key discipline is prospective sampling — include all signals meeting criteria, not just the ones that look good in hindsight.

What directional hit-rate should I expect from unusual options flow?

Honest studies show that high-scoring unusual flow (EXTREME tier, Vol/OI above 10×, ask-side sweep, premium above $250K) shows directional accuracy above 55–65% at 3-day horizons in liquid names. But this varies significantly by sector, market regime, and DTE. Lower-tier signals tend toward 50–55% — above random, but not dramatically so. Any tool claiming 75%+ hit-rates without publishing their full methodology and sample should be viewed with skepticism.

What makes options flow historical data different from options price history?

Options price history (OHLC data for each contract) is widely available from providers like CBOE, Nasdaq, and aggregators. Options flow historical data is narrower: it captures unusual prints specifically — sweeps, large blocks, high Vol/OI trades — with the context of how they were executed (aggressor side, sweep vs block, single-leg vs multi-leg). This execution context is what makes flow data actionable for signal research and cannot be derived from price history alone.

Do options flow tools publish their historical hit-rate data?

Almost none of them do — and this is a significant accountability gap. Most tools show curated examples of past signals that preceded big moves, which is classic survivorship bias. A genuine public track record requires prospective sampling (record every qualifying signal, not just the winners), outcome measurement at a fixed horizon, and minimum sample before publishing (to prevent misleading results from tiny samples). RadarPulse's Smart-Money Scorecard is built on this principle — tracking every EXTREME and ELEVATED signal with no cherry-picking.

Can I access options flow historical data via API?

Some providers offer historical flow data via REST API endpoints — typically a /flow/historical or /flow/heatmap endpoint that accepts a date range. The response usually includes a paginated list of qualifying prints with the same fields as the live feed. Access is typically gated at higher subscription tiers because historical data storage is expensive and the research value is higher than the live feed alone.

What Python libraries are useful for analyzing options flow historical data?

pandas for data manipulation and aggregation, numpy for vectorized calculations, matplotlib or plotly for visualization, scipy.stats for confidence intervals and significance testing, and seaborn for heatmaps (useful for Vol/OI × premium distributions). A typical backtest script: load historical flow CSV or API response into a DataFrame, apply filters, merge with underlying price data from yfinance or polygon, compute forward returns, and aggregate by score tier and sector.

Data & research · June 29, 2026

Options Flow Historical Data: How to Access, Backtest, and Evaluate Signal Accuracy

Historical options flow data is the foundation for any serious evaluation of whether unusual activity has real predictive value or is just noise that gets retroactively explained. Here is what historical flow data contains, how to access it, how to build a backtest, and the accountability gap that most flow tools quietly ignore.

What historical options flow data contains

A live options flow feed shows prints as they happen. Historical flow data is the same feed archived by session. Each record captures the state of the print at execution time, not after-the-fact reconstruction.

A quality historical dataset includes, per print:

Field	Why it matters for research
Timestamp	Maps the print to a market regime, session, and underlying spot price
Ticker & underlying spot	Required to compute forward returns
Contract type (CALL/PUT)	Defines the implied directional bet
Strike & expiry (DTE)	Separates urgent short-dated bets from long-dated hedges
Volume & open interest	The Vol/OI ratio, the strongest single factor in unusualness scoring
Premium paid	Conviction proxy: size filters out noise
Aggressor side (bid/ask/mid)	Ask-side indicates urgency; bid-side is more ambiguous
Execution type (sweep/block)	Sweeps cross multiple venues instantly and are strongly directional
Unusualness score	The composite signal quality metric, needed to segment outcomes by tier
Sector/market cap	Allows sector-level aggregation in the backtest

What historical flow data does not contain is the underlying's forward price; that must be joined from a separate OHLCV source (Polygon, Finnhub, yfinance) based on the snapshot timestamp.

How far back providers store it

Depth of history varies widely and is rarely disclosed upfront:

30–90 days: most common in entry-tier plans. Enough for pattern lookups, not enough for multi-regime backtesting.
6–12 months: covers a full market cycle but may miss major volatility events (2020 COVID crash, 2022 rate hike cycle).
2–5 years: necessary for statistically meaningful sector-level analysis and regime comparison. Usually a paid premium tier.
Full OPRA tape (all options prints): available from CBOE and Nasdaq directly, extremely large, priced for institutions. Most retail flow tools pre-filter this tape for unusual prints only.

For evaluating whether high-scoring flow signals carry real directional information, 12 months is the practical minimum. You need enough samples per sector and score tier to generate confidence intervals below ±10 percentage points.

How to backtest options flow signals

A rigorous options flow backtest follows five steps:

Define your signal universe. What qualifies as a signal? For example: score ≥ 70, premium ≥ $100,000, aggressor side = ASK, execution = SWEEP. This becomes your filter. Apply it identically to every session in your dataset: no look-ahead, no manual selection.
Record the snapshot. For each qualifying signal, record the underlying's closing price on the signal date (or the midpoint price closest to the signal timestamp for intraday precision). This is your entry reference price.
Measure forward returns. At your chosen horizon (1 trading day, 3 days, 5 days), record the underlying's closing price. The forward return = (exit price − entry price) / entry price × 100%.
Classify the outcome. A CALL signal is a directional hit if the forward return is positive; a PUT signal is a hit if the return is negative. This directional classification, not options P&L, is the cleanest signal quality metric (it removes IV, decay, and bid-ask noise from the equation).
Aggregate by tier, type, and sector. Calculate hit-rate (% correct directional calls), average forward return, and standard deviation by score tier (EXTREME / ELEVATED / NOTABLE), option type (CALL / PUT), and sector (Technology, Biotech, Energy, etc.). The tier × sector cross-tab is where the most actionable patterns appear.

Common backtesting mistakes

Survivorship sampling: only including signals that were followed by large moves. This is the most common form of options flow cherry-picking and produces meaninglessly inflated hit-rates.
Intraday timing games: using the low of day as the entry price for call signals and the high of day as entry for put signals. A fair backtest uses the same objective price (closing price or signal-time midpoint) for every signal.
Ignoring small samples: reporting a 100% hit-rate from 4 signals in biotech is worse than reporting a 60% hit-rate from 80 signals. Require a minimum sample (30+ per cell) before reporting a rate.
Mixing market regimes: a backtest that runs across 2020 (crash + recovery), 2021 (gamma squeeze), and 2022 (rate hikes) is averaging over very different environments. Segment by regime to find where signals are strongest.
Forgetting the denominator: reporting 10 winning examples without disclosing how many total signals were evaluated in the same period.

What hit-rates to expect (honestly)

Honest published research on options flow signal accuracy is sparse, because most providers have no incentive to publish numbers that might disappoint users. The figures that do appear in academic literature and independent analysis suggest:

EXTREME tier (score 85+, Vol/OI 10×+, premium $250K+, ask-side sweep): 3-day directional hit-rates in liquid large-cap names of 58–65% across bull-market regimes. Weaker in high-volatility environments (VIX above 25).
ELEVATED tier (score 70–84): 3-day hit-rates of 53–59%: meaningful outperformance over random, but with considerably more variance than EXTREME.
NOTABLE tier (score 55–69): 3-day hit-rates of 50–54%. Individually noisy, but sector aggregates still show actionable patterns, especially in healthcare and energy where catalyst-driven flow is more concentrated.

A few important calibrations:

These are directional hit-rates on the underlying, not options P&L. A 60% directional hit-rate on the stock does not mean 60% of options positions profit. Theta decay, IV changes, and bid-ask costs often turn correctly-directed trades into losses.
Hit-rates degrade meaningfully when DTE is short and the signal is in biotech ahead of a binary catalyst (FDA decision, earnings); the signal may be about volatility magnitude, not direction.
Congress + flow confluence on the same ticker, where available data exists, tends to show higher hit-rates than standalone flow; a cross-domain signal from two distinct data sources is harder to explain as coincidence.

Python workflow for signal analysis

A minimal backtest on exported options flow data using pandas and yfinance:

import pandas as pd
import numpy as np
import yfinance as yf
from datetime import timedelta

# Load your historical flow export (CSV from RadarPulse or API response)
flow = pd.read_csv("flow_history.csv", parse_dates=["timestamp"])

# 1. Filter for your signal universe
signals = flow[
    (flow["score"] >= 70) &
    (flow["premium"] >= 100_000) &
    (flow["side"] == "ASK") &
    (flow["kind"] == "SWEEP")
].copy()

# 2. For each signal, fetch the underlying's price at signal date + forward horizons
def forward_return(ticker, signal_date, days):
    start = signal_date
    end = signal_date + timedelta(days=days + 5)  # buffer for market closures
    hist = yf.Ticker(ticker).history(start=start, end=end)
    if hist.empty or len(hist) < 2:
        return None
    entry = hist["Close"].iloc[0]
    # Find the Nth trading day after signal
    idx = min(days, len(hist) - 1)
    exit_ = hist["Close"].iloc[idx]
    return (exit_ - entry) / entry * 100

results = []
for _, row in signals.iterrows():
    ret_3d = forward_return(row["ticker"], row["timestamp"].date(), 3)
    if ret_3d is None:
        continue
    # Directional hit: CALL needs positive return, PUT needs negative
    is_hit = (row["type"] == "CALL" and ret_3d > 0) or \
             (row["type"] == "PUT" and ret_3d < 0)
    results.append({
        "ticker": row["ticker"],
        "type": row["type"],
        "score": row["score"],
        "sector": row.get("sector", "Other"),
        "ret_3d": ret_3d,
        "hit": is_hit,
        "flag": "EXTREME" if row["score"] >= 85 else "ELEVATED" if row["score"] >= 70 else "NOTABLE"
    })

df = pd.DataFrame(results)

# 3. Aggregate by flag and sector
summary = (
    df.groupby(["flag", "type"])
    .agg(
        count=("hit", "size"),
        hit_rate=("hit", "mean"),
        avg_ret=("ret_3d", "mean"),
        std_ret=("ret_3d", "std")
    )
    .round(3)
)
print(summary[summary["count"] >= 30])  # only report cells with enough data

Key notes: use yfinance for quick backtests but Polygon or CBOE data for production research (yfinance data is not adjusted for splits in all edge cases and lacks intraday resolution). The forward_return function uses closing prices; for intraday precision, join to the minute-level OHLCV nearest the signal timestamp.

Sector-level heatmap

import seaborn as sns
import matplotlib.pyplot as plt

pivot = df.pivot_table(
    values="hit", index="sector", columns="flag",
    aggfunc=lambda x: x.mean() if len(x) >= 10 else None
)
sns.heatmap(pivot, annot=True, fmt=".0%", cmap="RdYlGn",
            center=0.5, vmin=0.4, vmax=0.7)
plt.title("Directional hit-rate by sector × score tier (3-day)")
plt.tight_layout()
plt.show()

This heatmap often reveals that a few sector × tier combinations drive most of the edge. For example, EXTREME call flow in Technology and Healthcare outperforms EXTREME call flow in Consumer Staples, which is structurally less catalyst-driven.

Evaluating provider data quality

Not all historical flow datasets are equivalent. Evaluate a provider on these dimensions before building research on their data:

Question to ask	Why it matters
How far back does the data go?	Less than 12 months is insufficient for multi-regime analysis
Is it raw OPRA tape or pre-filtered?	Pre-filtered data may exclude prints your criteria would catch; raw tape includes noise you'd need to filter yourself
Are Vol/OI ratios computed correctly?	Some providers use total OI across all strikes; correct is same-strike OI at trade time, a significant difference for short-dated prints
Is aggressor side included?	Without bid/ask classification, you can't filter for urgency, the most important execution quality signal
Are multi-leg (spread) prints separated from single-leg?	Multi-leg prints may look like unusual directional flow but are often synthetic positions, hedges, or risk reversals with no strong directional bias
Does the provider publish their own outcome data?	A provider confident in their signal quality should track outcomes. If they don't, ask why.

The accountability gap: why most tools hide track records

The options flow tool market has a systematic accountability problem. Because past prints are abundant, it's trivially easy to find examples that look prescient: EXTREME calls on a name three days before an earnings beat, large put sweeps ahead of a sector selloff. Social media amplifies these examples because they're compelling stories.

What you almost never see is the denominator: out of all the EXTREME calls in the same period, how many actually preceded upside? That number is available from the same data. It just doesn't get posted because it's usually closer to 55–60% than 90%, and "55% directional accuracy on high-scoring signals" is harder to tweet than a screenshot of a 100x put.

The correct standard is a prospective, systematic track record:

Every signal meeting the criteria is recorded at the time of the signal, not retroactively
Outcome is measured at a fixed, pre-declared horizon (1d, 3d, 5d)
Results are published once the sample is large enough to be meaningful (30+ outcomes)
The methodology (what counts as a signal, what counts as a hit) is published alongside the numbers

RadarPulse's Smart-Money Scorecard is built on this standard. Every EXTREME and ELEVATED print scored from a live session is logged with the underlying spot price, and the forward move is measured automatically as the session data accumulates. The track record builds prospectively, without cherry-picking, and the methodology is documented. The numbers that emerge are honest ones, useful for calibrating how much weight to put on any given signal, not a marketing claim about performance.

Accessing historical data via API

Most quality options flow tools expose historical data through a dedicated endpoint alongside their live feed. The typical pattern:

# RadarPulse historical flow endpoint (Elite tier, staged for next release)
GET /api/v1/flow/historical?from=2026-06-01&to=2026-06-29&score_min=70&type=CALL&limit=500

Authorization: x-api-key YOUR_KEY_HERE

# Response: paginated list of prints from the specified date range
{
  "prints": [
    {
      "ticker": "NVDA",
      "type": "CALL",
      "strike": 135,
      "dte": 7,
      "premium": 2450000,
      "volOI": 12.4,
      "side": "ASK",
      "kind": "SWEEP",
      "score": 91,
      "flag": "EXTREME",
      "spot": 131.20,
      "timestamp": "2026-06-15T10:23:41Z",
      "sector": "Technology"
    }
    // ...
  ],
  "total": 842,
  "next_cursor": "eyJ0cyI6MTc1MDAwMDAwMH0="
}

Key parameters to look for in a historical endpoint:

Date range (from / to): ISO 8601 dates or Unix timestamps
Score filter (score_min): pre-filter server-side to reduce payload size
Pagination cursor: essential for large date ranges; avoid offset-based pagination (offset becomes slow on large tables)
Underlying spot price: must be included in the historical record, not looked up later, to ensure the entry reference is the price at signal time, not the price when you fetch

For building a research pipeline, fetch historical data in batches of 30-day windows, cache to local Parquet files, and join with yfinance / Polygon for forward prices. Avoid re-fetching the same date ranges repeatedly; most historical endpoints count against rate limits even for repeated identical queries.

See the options flow API guide for authentication patterns, rate limit management, and WebSocket vs REST trade-offs in more detail.

What to do with the data once you have it

Beyond backtesting hit-rates, historical options flow data supports several other research workflows:

Pre-earnings pattern analysis

Filter historical prints to the 5 trading days before each company's earnings report and aggregate by score tier and direction. The question: do EXTREME call sweeps in the week before earnings show a higher directional hit-rate than the session average? If yes, earnings-window flow deserves higher conviction weight in your live workflow.

Congress × flow confluence scoring

Tag each historical print against the congressional disclosure data for the same ticker (available from RadarPulse's Congress tracker and the STOCK Act disclosure database). Compute hit-rates for prints where Congress was also active in the same name vs. prints without congressional overlap. This cross-domain validation is one of the most differentiated research questions available from public data.

Sector rotation timing

Aggregate historical EXTREME flow by sector per week. Build a time series of sector-level unusual activity premium. Identify weeks where a sector saw concentrated unusual flow, then measure the sector ETF's forward return at 5 and 10 trading days. This builds a signal for sector rotation timing, based not on price action (which is after-the-fact) but on real-money options positioning.

Score calibration

Run the backtest across multiple score thresholds (65, 70, 75, 80, 85, 90) and plot hit-rate vs. threshold. The inflection point, where hit-rates start improving meaningfully, is the empirically supported threshold for your specific universe and time period. This calibration is more reliable than using a threshold that was set arbitrarily at product launch.

Frequently asked questions

Is historical options flow data the same as options chain history?

No. Options chain history (historical OHLCV per contract, open interest per strike per day) is widely available from CBOE, Nasdaq, and data vendors. Options flow historical data is a subset: it captures only the unusual prints (sweeps, large blocks, high Vol/OI trades), along with the execution context (aggressor side, sweep vs block) that options chain history doesn't include. You cannot derive flow data from options chain history because the chain history shows end-of-day snapshots, not intraday execution details.

Can I access CBOE options flow data directly?

CBOE distributes options market data through its DataShop product. The raw OPRA tape (all options prints) is available for institutional subscribers, typically via FTP or SFTP in large daily flat files. It includes every trade but without the scoring, filtering, or execution-side tagging that flow tools provide. Building a flow tool from raw OPRA tape requires significant engineering: parsing 1–5GB daily files, computing Vol/OI ratios at the time of each trade (not end-of-day), identifying sweeps across multiple exchanges, and computing unusualness scores.

How many samples do I need for a meaningful backtest?

Per cell (each tier × type × sector combination), 30 samples is the minimum for a confidence interval narrow enough to be useful. For a standard backtest with EXTREME / ELEVATED tiers, CALL / PUT types, and 10 sectors, you need roughly 600 samples in the most granular cuts, achievable with 6–12 months of data from a tool with a reasonable premium floor filter.

What's the difference between backtesting and forward testing?

A backtest runs on historical data: it tells you how a strategy would have performed if you'd followed it in the past. Forward testing (also called paper trading or out-of-sample testing) applies the same strategy to live data and records the actual outcomes as they happen. Forward tests are more credible because they can't be unconsciously biased by the analyst seeing the outcomes before building the rules. RadarPulse's Scorecard is a forward test: signals are locked in at execution time and outcomes are measured prospectively.

Do options flow signals work differently in bear markets?

Put flow signals show stronger hit-rates in bear markets than call flow signals, for the intuitive reason that the underlying trend reinforces bearish directional bets. But aggregate put flow hit-rates in bear markets can be misleadingly high because any put signal benefits from the downtrend regardless of whether it was informational. The signal quality metric that survives regime changes better is not raw hit-rate but the excess hit-rate above the baseline for put flow in the prevailing regime: measuring the signal against a regime-appropriate null, not a fixed 50%.

RadarPulse Scorecard: the only transparent, prospective track record for unusual options flow. Every EXTREME and ELEVATED signal scored from a live session is recorded and measured forward. No cherry-picking, no retroactive selection. See the methodology →

RadarPulse is currently in its pre-launch phase. Historical data, API access, and the live Scorecard are building with every session.

Join the waitlist →