Open RadarPulse →
Transparency · Track record

Does unusual options flow actually work?

Every flow tool faces the same question: does it actually predict anything, or is it just noise with good branding? The RadarPulse Smart-Money Scorecard answers it with data. Every EXTREME and ELEVATED signal we score is automatically logged, and the underlying stock's forward price move is measured at 1, 3 and 5 days. No cherry-picking. No survivor bias. A real, public, disclaimed track record.

Scorecard is live in the app: see signal counts, the methodology and results as they accrue in real time. Joins the live flow view once you sign in.

Open RadarPulse free →

Why we built this: the credibility problem

Unusual options flow tools are loud about their signals and quiet about their accuracy. Most market the wins and forget the misses. When a tool claims "smart money is positioning in XYZ", the natural follow-up is how often does that call actually pan out? Almost no competitor has a public answer. That silence is itself information.

RadarPulse's position is simple: if our 0–100 scoring model is genuinely surfacing informed or positioned flow, the data should show it. If it's noise, we want to know that too, so we can fix it. The Scorecard is the mechanism that forces honesty. It's the antidote to the "trust us, we're right" style of marketing that defines most flow tools.

How the Scorecard works

Every time a real, live options print is scored at NOTABLE or above (score ≥ 55), RadarPulse automatically records three things:

  1. The signal itself: ticker, type (CALL or PUT), score, score flag (EXTREME / ELEVATED / NOTABLE), sector, premium, strike and DTE.
  2. The underlying stock's price at the moment the signal was scored.
  3. A timestamp used to look up the price 1, 3 and 5 calendar days later.

When each horizon arrives, the engine reads the underlying's new price and records whether the stock moved in the signal's implied direction, up for bullish call flow, down for bearish put flow, by any amount. That binary yes/no is the directional hit. Results are then aggregated by flag, type and sector to show where the scoring model has signal, where it's flat, and where it misses.

What the hit-rate measures (and what it doesn't)

The hit-rate is a directional underlying-move proxy. It answers: "Did the stock move the way the signal implied?" It does not measure options P&L, which depends on factors the flow signal alone can't predict: implied volatility at entry and exit, time decay (theta), bid-ask spread, and how the position was structured. A directional hit on the stock doesn't automatically produce a winning options trade, and the Scorecard is upfront about this.

We chose this metric because it's the most honest proxy available from public data: the underlying price is objective, observable and can't be gamed. Options P&L would require tracking specific contracts across their full lives, which introduces slippage and liquidity assumptions that would make the numbers look either better or worse depending on the choices made. The underlying move is a clean, verifiable baseline.

When results publish

Results appear once at least 30 outcomes have been measured. This minimum-sample gate exists to prevent one good week from looking like a permanent edge. Until then, the Scorecard shows a live tally of how many signals have been logged and how many have reached their measurement horizons, transparency about the build, not false precision.

Once results publish, they continue to update daily as new horizons resolve. The track record gets stronger, more granular and more meaningful every session the product runs. That compounding is why this is infrastructure, not a feature: the longer RadarPulse runs, the harder this moat becomes to copy. A competitor who wants to show the same thing has to start tracking today and wait months.

What keeps the Scorecard honest

How RadarPulse scores flow in the first place

The Scorecard is only as interesting as the scoring model it tests. RadarPulse's 0–100 unusualness score is built from four components:

Prints scoring 85+ receive the EXTREME flag; 70–84 receive ELEVATED; 55–69 receive NOTABLE. The Scorecard will eventually show which tier historically has the strongest forward directional signal, and whether that ordering holds up or inverts.

The cross-domain angle: flow + Congress

One of the more compelling research questions the Scorecard will eventually answer: does flow aligned with congressional stock trades in the same ticker show a higher hit-rate than raw flow alone? When the options tape lights up in a stock that Congress is also buying or selling, that confluence of signals from two distinct information channels is harder to dismiss as noise. The Scorecard's outcome data, combined with the cross-domain tagging already built into RadarPulse, is the mechanism for answering this question with real numbers rather than anecdotes.

How this fits the bigger picture

The Scorecard is part of RadarPulse's broader commitment to making flow not just observable but actionable and proven. The full platform combines:

Each layer reinforces the others. The Scorecard is the proof that the flow scoring model has signal. The AI context explains why. The cross-domain confluence raises the confidence bar. Paper trading lets you test whether your read translates into a simulated P&L before you commit real capital. Together they form a loop from signal to research to evidence to practice.

Score tier performance expectations: EXTREME vs. ELEVATED vs. NOTABLE

The three score flags are not arbitrary marketing labels, they represent meaningfully different levels of multi-factor alignment. Understanding what each tier implies about the expected signal quality is important context for reading the Scorecard results as they accumulate.

EXTREME (score 85–100) represents the convergence of multiple high-signal factors simultaneously: a Vol/OI ratio typically 10× or higher, large absolute premium ($250,000+), an aggressive sweep execution (not a negotiated block), and a directional aggressor (buying on the ask). An EXTREME print is the options tape equivalent of a flashing warning light, something with unusual urgency and size. These are the prints the Scorecard prioritizes because they represent the clearest expression of institutional intent available from public data. False positives still exist (algorithmic hedges, structured products, and index rebalancing can all produce individually high scores), but the convergence requirement meaningfully reduces them.

ELEVATED (score 70–84) represents strong but not maximal conviction, perhaps a very high Vol/OI ratio without the size, or large premium without the sweep execution. These are meaningful signals worth tracking but historically show more variance in directional accuracy. A single elevated factor without the corroborating ones is more ambiguous.

NOTABLE (score 55–69) is the broadest tier, capturing activity that is above-average but not clearly institutional-conviction-grade. These prints are useful for building context (a stock accumulating NOTABLE signals across multiple sessions is behaving differently than a quiet name) but carry more noise individually. The Scorecard's sector and type aggregations will show whether NOTABLE signals carry useful predictive content as a group even when individual prints are ambiguous.

One of the most interesting questions the Scorecard will answer is whether the tier ordering holds up empirically: does EXTREME outperform ELEVATED which outperforms NOTABLE, or do the relationships invert or flatten in certain sectors? The null hypothesis, that the scoring model captures no real signal, predicts all three tiers should show ~50% directional hit rates. A confirmed ordering with EXTREME at the top would be strong evidence that the scoring model is doing something useful.

Sector-level signal patterns: where flow tends to be strongest

Unusual options flow is not uniform across sectors. The structural reasons for this are worth understanding, both because they affect how to interpret the Scorecard's sector breakdowns and because they help calibrate expectations for individual prints.

Technology and semiconductors generate the largest absolute flow volumes because the underlying names are large, liquid, and heavily traded by institutions. A large-cap tech stock may see hundreds of millions of dollars in daily options premium across all strikes. This means the Vol/OI bar is higher, a ratio of 5:1 on NVDA means something different than 5:1 on a mid-cap industrial, because NVDA always has elevated baseline activity. Paradoxically, EXTREME prints on mega-cap tech often reflect sharper conviction than the score alone suggests, because the noise floor is already high.

Biotech and pharma produce some of the most information-dense flow signals. Binary catalyst events (FDA decisions, Phase 3 readouts, advisory committee votes) create windows where informed positioning is both high-stakes and time-sensitive. EXTREME call or put sweeps in the week before a binary catalyst are among the most watched signals in the options market. The Scorecard's sector breakdown will reveal whether biotech/pharma EXTREME signals show higher or lower hit rates than the broad average, a finding that would have direct implications for how to weight these prints.

Energy flow is often driven by geopolitical and macro factors rather than company-specific catalysts, making directional signals more dependent on getting macro calls right. Options flow in energy names that aligns with congressional activity (particularly members of energy-focused committees) has historically been among the more compelling cross-domain confluence signals.

Financials are rate-sensitive, making options flow in banks and insurance companies a proxy for institutional rate expectations. Large LEAPS call accumulation in financial names during rate-hiking cycles may reflect both directional equity bets and positive rho carry, the Scorecard's DTE breakdown will help separate these effects.

Consumer discretionary and industrials tend to show more variable signal quality, the sector is broad, catalysts are diverse, and institutionally informed flow is harder to distinguish from hedging and sector rotation. These sectors may show the clearest split between EXTREME (more reliably catalyzed) and NOTABLE (more noise) in the Scorecard's tier analysis.

The Vol/OI ratio: the primary signal driver in depth

Volume divided by open interest (Vol/OI) carries 40% of the RadarPulse conviction score, more than any other single factor. Understanding why this metric earns that weight explains much of what the Scorecard is testing.

Open interest represents the total number of options contracts currently outstanding on a specific strike and expiry, the existing "book" of positions. Volume is the number of contracts traded on a specific day. When today's volume is large relative to existing open interest, it means the market is doing something new and unusual at that strike, not simply continuing an existing position.

A Vol/OI ratio of 1:1 means the day's trading is equal to all existing open contracts, unusual but possible for a widely traded strike. A ratio of 5:1 means five times the existing open interest traded in a single day, strongly suggestive of fresh positioning or position closure at scale. A ratio of 20:1 or higher on a relatively quiet strike often signals that someone opened a large new position that dwarfs the existing book, this is the pattern most associated with pre-event positioning.

The key insight: high Vol/OI is harder to explain away as routine hedging than high absolute volume. A fund managing a $500M equity portfolio might routinely roll large options positions, this produces high volume but typically not high Vol/OI, because the strike already has significant open interest from prior rolls. By contrast, a fresh directional bet on a strike with thin open interest produces a striking Vol/OI ratio even if the absolute dollar size is modest.

The Scorecard's outcome data will eventually reveal how the Vol/OI weight relates to predictive accuracy: do prints with extremely high Vol/OI ratios (20:1+) show higher hit rates than prints with moderate ratios (5:1) at the same overall score level? This kind of factor-level decomposition is the kind of analysis that competitor tools, who don't track outcomes at all, are structurally incapable of running.

Why public track records matter: the accountability gap in flow tools

Options flow tools have a marketing problem that most don't acknowledge: it's easy to cherry-pick signal examples from the past and claim they predicted subsequent moves. The internet is full of screenshots of flow prints taken days or weeks before a big move, posted with caption variations of "smart money knew." What these posts never include is the dozens of comparable prints that preceded the move did nothing, or the prints that preceded moves in the opposite direction.

This is survivorship bias in its clearest form. If you score thousands of unusual options prints per month and only post the ones that aged well, your content looks prescient even if the underlying signals are random. The viewer has no way to evaluate the base rate, what fraction of "EXTREME" signals preceded meaningful moves in the implied direction? Without that denominator, the numerator (the spectacular wins) is meaningless as evidence.

No major competitor in the flow intelligence space publishes a prospective, systematic track record. Some publish occasional "win" examples. Some show current flow with no outcome data at all. The absence of a public track record is not an accident, it protects tools from accountability while allowing them to selectively market successes. It also leaves users with no empirical basis for calibrating how much weight to put on any given signal.

RadarPulse's Scorecard inverts this dynamic deliberately. By committing to prospective, automatic, no-cherry-picking outcome tracking from day one, it creates a verifiable baseline that grows more credible with every session. A track record built over 12 months of live trading, with thousands of measured outcomes and a locked methodology, is a fundamentally different evidence standard than curated examples. It also creates a genuine moat: a competitor cannot retroactively claim the same track record. They would have to start tracking today and wait.

Using the Scorecard in your trading workflow

Once the Scorecard has accumulated sufficient outcomes to publish results, the data becomes a calibration tool, a way to apply historical hit-rates to current signals rather than trading every EXTREME print with the same confidence level.

A few practical applications as results mature:

Tier weighting. If the Scorecard shows that EXTREME signals in technology stocks have a 3-day directional hit rate of 65% while NOTABLE signals in the same sector hit 51%, you have a principled basis for sizing positions differently based on tier, not because the individual print is more trustworthy, but because the aggregate evidence from similar prints suggests it is.

Sector filtering. If biotech EXTREME call sweeps ahead of known catalyst windows show a materially higher hit rate than average, and you see an EXTREME biotech call sweep with an FDA decision upcoming, the Scorecard data gives you a concrete reference point. Conversely, if a sector shows near-random hit rates across all tiers, that's a calibration signal too: this type of flow in this sector may not carry real directional information.

Cross-domain confirmation. The Scorecard will eventually answer whether flow signals that coincide with congressional activity on the same ticker show higher hit rates than standalone flow. If yes, seeing both signals active on the same name is quantitatively more significant than either alone, not just intuitively, but evidentially.

Paper trading integration. The paper trading wallet built into RadarPulse is the practice layer for applying Scorecard findings. When you see an EXTREME signal in a sector the Scorecard shows has historically strong hit rates, simulate the trade in the paper wallet. Track whether your read, entry timing, strike selection, expiry, translates into a simulated profit. The paper wallet's performance log is your personal calibration layer on top of the aggregate Scorecard data.

The loop: Scorecard shows aggregate signal quality → you use it to prioritize which prints to act on → paper wallet shows your personal hit rate on those prints → over time you learn both whether the signal type works and whether your interpretation of it works.

What the Scorecard cannot tell you: honest limitations

Transparency about what the Scorecard measures is as important as the data itself. A few limitations are worth being explicit about.

Directional hit-rate is not options P&L. The Scorecard measures whether the underlying stock moved in the signal's implied direction within 1, 3 or 5 days. A call signal that "hits" by this measure, the stock moved up, does not mean a call buyer profited. Options P&L depends on the specific contract purchased (strike, expiry), the implied volatility at entry and exit, theta decay, bid-ask spread, and timing of exit. A stock that rises 1% over 3 days is a directional hit for a 30-day ATM call buyer who paid low IV, but potentially a loss for a 5-day OTM call buyer who paid elevated IV into an earnings event. The Scorecard measures the signal, not any specific way of trading on it.

The 1/3/5-day window may not match the holding period of the flow's originator. An institutional buyer of 12-month LEAPS calls may intend to hold the position through multiple earnings cycles. A 5-day directional check on that position is not the right measurement window for their thesis, even if it's a useful short-term proxy. The Scorecard uses 5 days as the outer horizon for practical reasons (price data is reliable, sample accumulates fast), but acknowledges this may understate the predictive value of longer-dated signals.

Aggregate hit-rates don't tell you which individual print to act on. Even if the Scorecard shows that EXTREME technology calls historically hit the 3-day direction at 65%, that does not mean any specific EXTREME technology call print is a 65% probability win. The aggregate reflects the distribution of past prints across many names, market conditions, and entry timings. Individual prints carry additional idiosyncratic risk that the aggregate cannot capture. The Scorecard is a calibration tool, not a guarantee.

Market regime matters. Hit-rates measured during a trending bull market may not hold in a sideways or bear market. The Scorecard will eventually show results across multiple regimes, which will either confirm that the signal is regime-robust or reveal that it works better in specific conditions. Until sufficient regime diversity is captured, extrapolating strongly from early results is premature.

Frequently asked questions

Does the Scorecard track every single options print RadarPulse scores?
No. Only prints from verified live-data sessions are tracked, never simulator or sample-session prints. Within live sessions, the highest-scored print per ticker per session is recorded rather than every print, to prevent one busy session on a noisy name from dominating the aggregate. This means the Scorecard is measuring the strongest available signal per ticker per day, not every signal across the full tape.

Why 30 outcomes before results publish?
Thirty is a rough threshold for basic statistical interpretability, enough outcomes to calculate a confidence interval that isn't absurdly wide. A track record based on 5 outcomes could be 80% hits purely by chance; 30 makes a random result much less plausible. As the sample grows into the hundreds and thousands, the confidence intervals narrow and the results become progressively more meaningful.

How do you prevent the scoring model from being retroactively tuned to look good?
The scoring model weights are locked before the Scorecard begins tracking. Changing the model after seeing outcome data would be a form of overfitting, optimizing to past results in a way that would inflate apparent accuracy without improving forward predictions. The methodology page documents the weights and any changes are disclosed, so there's a public record of what was tuned when.

Will the Scorecard eventually track options P&L instead of just underlying direction?
Tracking true options P&L requires assumptions about which specific contract was purchased, at which IV, at which bid-ask midpoint, and how it was exited, each assumption adds uncertainty. The underlying directional move is the cleanest verifiable baseline available. If demand for P&L tracking is high enough and a fair methodology can be defined, it's a future possibility, but any such metric would require very careful disclaiming to avoid misleading interpretation.

Can I see the individual signals that are being tracked?
Yes. In the app, the Scorecard view shows the live feed of tracked signals with their recorded details, ticker, score, type, sector, premium, and the timestamp used for forward price measurement. The outcome column fills in as each horizon resolves. The full signal history is visible to signed-in users, so you can examine individual cases, not just aggregate statistics.

The compound data advantage: why the track record strengthens over time

The Scorecard is not just a feature, it's a data asset that grows more valuable with every session that passes. This compounding dynamic is worth understanding because it explains why starting the track record as early as possible matters strategically.

In the first month, with perhaps 200-400 measured outcomes, the Scorecard can show aggregate hit rates but sector and tier breakdowns may not have statistical significance. In three months, with 1,000+ outcomes, sector-level patterns become visible. In six months, cross-domain confluence effects (flow + Congress) have enough cases to show whether the combined signal is materially stronger. In twelve months, with multiple market regimes captured (high-volatility, low-volatility, trending, choppy), the track record becomes a genuine research asset that no new entrant can replicate without years of wait.

This is the deepest moat in the RadarPulse product strategy. The live flow feed, the scoring algorithm, the congressional tracker, the AI research layer, all of these could theoretically be replicated by a well-funded competitor in months. The outcome-tracking database cannot. It requires the actual passage of time and real market sessions. Every day the Scorecard runs, the data advantage grows, and the credibility gap between RadarPulse and tools that don't track outcomes widens.

For users, this also means that the value of the Scorecard improves the longer you use RadarPulse. An early subscriber is watching a track record build from day one, they have access to historical hit-rate context for every future signal they see. A user who joins a year later will find a more mature, granular, credible dataset. Either way, the Scorecard makes the flow signals more useful because it answers the core question that raw flow data alone never could: not just "is something unusual happening?" but "how often does this type of unusual thing actually mean something in the market?"

Reading Scorecard data across different market regimes

Signal quality in options flow is not constant, it fluctuates with market regime. Understanding how to contextualise Scorecard hit rates relative to the prevailing environment makes you a sharper consumer of the data.

High-volatility regimes (VIX above 25): Directional accuracy often drops because price moves are larger, faster, and more prone to sudden reversal. An EXTREME call signal that "works" directionally in a low-vol environment may still result in the stock moving the right way but then reversing hard within the DTE window. Hit rates in high-vol periods tend to be noisier and less reliable as standalone inputs. When scanning the Scorecard in these environments, focus on shorter-DTE signals (0–7 days) where the directional window is small enough that the noise cannot accumulate as much across the measurement period.

Low-volatility, trending regimes (VIX below 15, SPX in a defined uptrend): EXTREME call sweeps and large-premium blocks tend to show the strongest hit rates historically. The market's directional bias reinforces smart-money directional bets. This is where you expect to see the Scorecard's aggregate call-bias numbers look most impressive, and where it's most important to remember that correlation with the trend is partly explaining those numbers.

Choppy, sideways regimes: Confluence signals, where flow, congressional activity, and score all align on the same ticker and direction, tend to have the best signal-to-noise ratio. Pure flow signals in choppy markets are inherently harder to act on with confidence. The Scorecard reflects this: you'll see wider variance in hit rates and smaller differentiation between EXTREME and ELEVATED tiers. Using the Sector and Tier filters together helps isolate the sub-segments with the most consistent pattern even when the broad market is directionless.

Earnings season: Signals with short DTE (0–3 days) and high premium in the week before a company's earnings report are a distinct and often misread category. These are often informed by volatility expectations, not directional views, the options buyer may be right about a big move but wrong about direction. The Scorecard tags pre-earnings signals separately so you can study this sub-population independently. Historically, pre-earnings sweep accuracy on direction is noisier than inter-earnings signals, which is why the Scorecard's data segmentation here is valuable: it prevents the noisier pre-earnings category from polluting the signal statistics for the rest of the dataset.

The practical takeaway: use the Scorecard as a relative tool, not an absolute one. When aggregate hit rates look strong, that is useful context, but ask what the market regime was during that period. When they look weaker, ask whether a regime shift explains it before drawing conclusions. The Scorecard's time-series view exists precisely to let you make these comparisons across rolling windows of market history. The goal is always to separate genuine signal from lucky correlation, and a public, regime-tagged track record is the most honest tool for doing that rigorously.

Disclaimer

The Smart-Money Scorecard measures the underlying stock's forward price movement as a proxy for directional signal accuracy. It does not measure options P&L. Hit-rates are historical patterns only, they do not predict future results. Nothing on RadarPulse is financial advice. Options trading involves substantial risk of loss. The Scorecard is for research and educational purposes only.

See the Scorecard live in the app

The track record accrues automatically with every scored session. Sign in free to watch it build, and to view the flow signals behind the numbers.

Open RadarPulse free →

Related