Statistical rigor built in

How We Test Strategy Credibility

Most backtesting tools let you overfit without knowing it. Our 10-layer validation pipeline applies institutional-grade statistical rigor to every optimization.

13-Way Data Split
2Walk-Forward Validation
3Deflated Sharpe Ratio
4Monte Carlo Significance
5Realistic Commission Models
6Parameter Robustness
7Market Regime Detection
8Regime-Conditioned Strategy Selection
9Credibility-First Champion Selection
10Sign-Correct Short Accounting
1

3-Way Data Split

The foundation of honest backtesting is data isolation. We split every ticker's historical data into three non-overlapping segments:

Historical Price Data

Train (50%)
Validate (25%)
Holdback (25%)
Optimizer explores here
Best candidate selected
Never seen — true OOS
  • Train (50%) — the optimizer explores parameter combinations here.
  • Validate (25%) — used to select the best candidate from the training results.
  • Holdback (1-year minimum) — data the strategy NEVER sees during optimization. Holdback returns are the closest proxy for live performance.

Why this matters: Without a holdback period, every backtest is in-sample. You are evaluating performance on the same data used to choose the strategy. This is the most common source of backtest overfitting, and most retail tools do not guard against it.

2

Walk-Forward Validation

Within the training period, we run a 5-window rolling train/test protocol. Each window trains on a portion of the data and tests on the immediately following out-of-sample segment.

  • Strategies that fail walk-forward are eliminated automatically before reaching the validation phase.
  • This tests whether the strategy adapts to shifting market regimes — trending, mean-reverting, and volatile environments.
  • Only strategies that perform consistently across all five windows advance.

Why this matters: A strategy that works brilliantly in one time window but fails in others is likely overfit to a specific market regime. Walk-forward validation catches this before you risk capital.

Pardo, R. (2008). "The Evaluation and Optimization of Trading Strategies." Wiley.

3

Deflated Sharpe Ratio

When an optimizer tests hundreds or thousands of parameter combinations, the best result will look impressive by pure chance. The Deflated Sharpe Ratio (DSR) corrects for this multiple-testing bias.

  • Uses the number of independent trials to adjust the significance threshold for the Sharpe Ratio.
  • Accounts for non-normality (skewness, kurtosis) of return distributions.
  • A Sharpe of 2.0 from 1,000 trials is statistically very different from a Sharpe of 2.0 from a single trial. DSR quantifies this difference.

Why this matters: Without this correction, optimizers will always find something that looks good — it is a mathematical certainty when the trial count is large enough. DSR tells you whether your best result is genuine or a statistical artifact.

Bailey, D.H. & López de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality." Journal of Portfolio Management, 40(5), 94–107.

4

Monte Carlo Significance

We use a permutation test to determine whether the strategy's returns are statistically distinguishable from random chance.

  • Shuffles daily portfolio returns using block bootstrap to destroy the timing signal while preserving serial correlation structure.
  • Runs 1,000+ permutations per test to build a null distribution.
  • The strategy must achieve a p-value below 0.05 — meaning there is less than a 5% probability that random reordering would produce equal or better returns.

Why this matters: Even after DSR correction, a strategy could have a decent Sharpe purely from favorable return clustering. The permutation test asks a direct question: does the specific ordering of trades matter, or would any random ordering do just as well?

White, H. (2000). "A Reality Check for Data Snooping." Econometrica, 68(5), 1097–1126.

5

Realistic Commission Models

A strategy that looks profitable with zero transaction costs often is not. We apply broker-specific commission schedules to every backtest.

  • 12 broker presets: IBKR Pro Tiered, IBKR Pro Fixed, Robinhood, Schwab, Fidelity, E*TRADE, TD Ameritrade, Webull, Firstrade, TradeStation, Tradier, and Alpaca.
  • Per-share rates with minimum and maximum per-order caps.
  • SEC Section 31 fees and FINRA TAF regulatory fees applied to every sell order.

Why this matters: Frequent-trading strategies are particularly sensitive to commissions. A strategy with 200 round-trips per year can lose 2–5% of its returns to transaction costs alone. If your backtest ignores this, its equity curve is fiction.

6

Parameter Robustness

After optimization selects the best parameters, we perturb each one to see if performance degrades gracefully or collapses.

  • Each optimized parameter is varied ±20% from its selected value.
  • If performance collapses outside a narrow range, the strategy is flagged as curve-fitted.
  • Robust strategies exhibit a performance plateau — they work across a range of nearby parameter values, not just one magic setting.

Why this matters: A strategy that only works at exactly RSI(14) with a 2.1 standard deviation Bollinger Band is almost certainly overfit. Real market edges are broad enough to survive minor parameter variation.

7

Market Regime Detection

Every signal and strategy operates within a market regime — bull, bear, sideways, or crisis. alphactor.ai classifies the current regime using a three-layer ensemble grounded in two of the most cited papers in financial econometrics.

  • Layer 1: Hamilton (1989) Markov-Switching Regression fits a 2-regime model on log returns with switching variance. The Hamilton smoother produces proper Bayesian posterior state probabilities — not ad-hoc threshold rules.
  • Layer 2: Kritzman et al. (2012) volatility overlay detects crisis conditions by ranking current realized volatility against its full historical distribution. Bear markets with volatility above the 80th percentile are escalated to Bear/Crisis.
  • Layer 3: SMA-50 trend confirmation prevents false bull signals during bear-market bounces and vice versa, providing the Sideways classification when the MS model is uncertain.

Why this matters: Strategies that backtest well in one regime often fail catastrophically in another. Regime-aware analysis helps you understand whether a signal is robust across environments or merely overfit to the current market phase.

Hamilton, J.D. (1989). Econometrica, 57(2). · Kritzman, M., Page, S., & Turkington, D. (2012). Financial Analysts Journal, 68(3). · Nystrup, P., Lindström, E., Pinson, P., & Madsen, H. (2024). "Learning Hidden Markov Models for Regression with Unaligned Timestamps." arXiv:2402.05272.

8

Regime-Conditioned Strategy Selection

Knowing the current market regime is only half the problem. The other half is selecting which strategy to deploy in each regime. alphactor.ai uses two regime-conditioned selection methods inspired by mixture-of-experts architectures in machine learning.

  • Hard Switch — a single champion strategy is selected per regime via argmax over regime-conditional expected scores. Hysteresis and a switching penalty prevent whipsawing between strategies during regime transitions.
  • Soft Mixture-of-Experts (MoE) — instead of picking one winner, a softmax gate blends the top-K strategies weighted by their regime-conditional scores. A hierarchical family-level gate groups strategies by signal type, preventing overconcentration in one signal family.
  • Both methods use an online regime posterior (the Hamilton filter from Layer 7) so that strategy weights at each bar depend only on data available at that point — no lookahead bias.

Why this matters: A single strategy optimized across all market conditions is a compromise that underperforms in every regime. Regime-conditioned selection lets the system deploy the right tool for the current environment while controlling switching costs and turnover.

Meta-Learning Mixture of Experts for Regime-Aware Portfolio Construction. arXiv:2505.03659.

9

Credibility-First Champion Selection

Picking which strategy to surface as the champion is itself a statistical decision. We rank candidates with a hard credibility floor + cup-tier priority — so a backtest only earns the champion slot if it both beats the underlying stock on the held-out year AND survives our credibility tests.

  • BEATS cup tier (1y holdback) — strategies are graded 🏆 to 🏆🏆🏆🏆 based on whether they beat the underlying stock and SPY on the held-out 1-year window. cup_4 = beats stock and SPY both by ≥5pp; trails = doesn't beat the stock.
  • Hard credibility floor — a candidate must pass p-value ≤ 0.05 AND DSR ≥ 0.20 AND ≥ 30 trades to be eligible for cup-priority promotion. Below the floor we fall back to the credibility-haircut composite score, so a credible-but-trailing champion never gets displaced by a noisy cup_3 from too few trades.
  • Cup priority within the credible cohort — once the floor is met, cup_4 outranks cup_3 outranks cup_2, with composite score as the tiebreak. This stops the system from rewarding a marginal in-sample edge over a strategy that actually beat the stock on the held-out year.

Why this matters: A pure 'highest score wins' system rewards in-sample optimization tricks. By promoting champions on a held-out 1-year metric AND requiring statistical credibility, we surface strategies that are both real and useful — and we tell the user honestly when no candidate passes both gates.

10

Sign-Correct Short Accounting

When a strategy can go short, every percentage return, drawdown, and Sharpe must be computed with the correct sign — or the cup tier shows a strategy that looks profitable but isn't. Phase 1 ships an internal accounting engine that handles long, short, and mixed books with textbook-orthodox math.

  • Short proceeds credit cash at the open; margin reserved at Reg T 50% initial, 30% maintenance. Buying power equals equity minus margin used, never "available cash".
  • Daily borrow accrual: short_qty × close × annual_borrow_rate / 252, debited from cash each bar. Backtests assume 2%/yr default; hard-to-borrow names bump to 15%/yr.
  • Forced cover trigger: when equity drops below 30% of short market value, the largest-loser short closes at the next bar — same behavior your real broker would force on you.

Why this matters: A long position can lose at most 100% of capital. A short position can lose multiples — and the math behind drawdown, win rate, and Sharpe is direction-dependent. Without sign-correct accounting, a backtest will silently overstate short profits by the borrow-cost-it-never-paid and understate drawdowns by the margin-call-it-never-felt. Cup tier shown to a user is then a half-truth at best.

See docs/short-selling-plan.md §7.2 and §10 for the accounting equations and the 20-test regression corpus that gates v2 from breaking long-only behavior.

Alpha Family Registry

52 academic alpha families

Each family is a published, peer-reviewed strategy from the academic finance canon — grounded in an explicit economic mechanism. All 48 flow through the same 9-layer credibility harness above; the picker ranks them lane-agnostically so the strongest evidence wins, regardless of which paper it came from.

Families per sleeve · 52 total

Trend
10
Mean-Reversion
4
Pairs
2
Quality
2
Event
7
Macro
5
Short-Flow
3
Microstructure
3
Accounting
5
Risk-Premium
4
Diffusion
1
Sentiment
1
Meta
5
#1multi_horizon_trend
Trend

Multi-window SMA-cross vote across {50/200, 20/60, 10/30}

Multi-window trend votes (canonical TA construction)

#2cross_sectional_momentum
Trend

Top/bottom decile of 6-month return vs universe

Jegadeesh & Titman 1993, JF

#3breakout_proximity
Trend

Close ≥ 95% of N-day high with volume confirm

Breakout / anchor effect (TA canon)

#4vol_timed_ma
Trend

SMA crossover gated by realized-vol regime

Vol-timed trend (Moskowitz-Ooi-Pedersen 2012 derivative)

#5pairs_relative_value
Pairs

Sector/peer z-score, ±2σ entry, revert-to-zero exit

Engle-Granger pairs (TA quant standard)

#6range_regime_meanrev
Mean-Reversion

RSI + range position, gated on low ADX

Bollinger / RSI mean-rev (TA canon)

#7breakout_volume
Trend

Donchian breakout + volume surge + ATR expansion

Donchian / Turtle Traders (Curtis Faith)

#8flow_confirmed
Microstructure

FINRA short-volume + (planned) options OI/GEX confirmation

Hasbrouck 1995, JF (flow-info-content)

#9event_aware
Event

Earnings calendar filter + post-event drift

Ball & Brown 1968, JAR

#10regime_overlay
Macro

VIX × SPY 200d-MA regime gate (risk-on/off)

Daniel-Moskowitz 2016 (regime overlay)

#11tsmom
Trend

Sign(trailing 12m return) × inverse-vol scaling

Moskowitz, Ooi & Pedersen 2012, JFE

#12idiosyncratic_momentum
Trend

Trailing residual return after stripping FF5+MOM betas

Blitz, Hanauer & Vidojevic 2020, JFE

#13pead
Event

Post-Earnings Announcement Drift on surprise > +5%

Bernard & Thomas 1989, JAE

#14iv_skew
Microstructure

Put-call IV skew + ATM IV term structure (skeleton — needs vendor)

Xing, Zhang & Zhao 2010, JFQA

#15lottery_max
Mean-Reversion

Bottom-quartile MAX[5d return] outperforms top

Bali, Cakici & Whitelaw 2011, JFE

#16bab
Quality

Betting-Against-Beta: long low-β levered, short high-β hedged

Frazzini & Pedersen 2014, JFE

#17sector_momentum
Trend

Ticker vs sector ETF (XLF/XLK/...) outperformance

Moskowitz & Grinblatt 1999, JF

#18short_term_reversal
Mean-Reversion

Sub-week mean reversion + reversal-up confirm

Jegadeesh 1990, JF

#19vrp_vix_term
Macro

Variance Risk Premium + VIX/VIX3M term structure

Bollerslev, Tauchen & Zhou 2009, RFS

#20pairs_cointegration
Pairs

Engle-Granger residual ±2σ, cointegration-tested

Gatev, Goetzmann & Rouwenhorst 2006, RFS

#21qmj
Quality

Quality-Minus-Junk composite (profitability × growth × safety)

Asness, Frazzini & Pedersen 2019, RAS

#22ofi_microstructure
Microstructure

L1 NBBO order-flow imbalance (skeleton — needs vendor)

Cont, Kukanov & Stoikov 2014, JFM

#23champion_overlay
Meta

Production champion + regime gate (VIX × SPY)

Asness-Frazzini-Israel-Moskowitz 2015 (AQR factor timing)

#24short_interest_change
Short-Flow

FINRA daily short-volume z-score (continuation + squeeze-fade)

Boehmer, Huang & Jiang 2010, RFS

#25borrow_rate_spike
Short-Flow

IBKR borrow-fee z-score, hard-to-borrow filter

Engelberg, Reed & Ringgenberg 2018, RFS

#26ftd_threshold_list
Short-Flow

SEC Reg SHO threshold inclusion streak (FTD persistence)

Boulton & Braga-Alves 2010 (Reg SHO compliance)

#27gap_play
Mean-Reversion

Overnight gap continuation (gap-and-go) and fade reversal

Gap dynamics (microstructure canon)

#28atr_breakout
Trend

Donchian channel break ± 1.5×ATR(20), Turtle 10-day exit

Faith 2003 (Turtle Traders)

#29high_52w_momentum
Trend

Proximity to 52-week high + deep-drawdown bounce

George & Hwang 2004, JF; Geczy & Samonov 2015

#30calendar_anomalies
Macro

Turn-of-month + pre-FOMC drift + Wed/Thu effect

Ariel 1987, JFE; Lucca & Moench 2015, JF

#31insider_form4
Event

SEC Form 4 cluster-buy: ≥3 insiders OR CEO/CFO > $1M

Cohen, Malloy & Pomorski 2012, JF

#32earnings_announcement_premium
Event

T-2 to T+1 window around scheduled earnings

Frazzini & Lamont 2007, JFE

#33cash_operating_profitability
Accounting

Cash-OP / total_assets quarterly z-score + trend filter

Ball, Gerakos, Linnainmaa & Nikolaev 2016, JFE

#34idio_vol_puzzle
Risk-Premium

FF3-residual IVOL z-score: short high-IVOL, long low

Ang, Hodrick, Xing & Zhang 2006, JF

#35industry_lead_lag
Diffusion

Sector ETF vs SPY × ticker vs sector lead-lag

Menzly & Ozbas 2010, JF

#36buyback_drift
Accounting

Net shares-outstanding contraction over 2 consecutive quarters

Peyer & Vermaelen 2009, RFS; Ikenberry et al 1995, JFE

#37sloan_accruals
Accounting

Balance-sheet accruals z-score: long low-accrual, short high

Sloan 1996, AR

#38novy_marx_gross_profitability
Accounting

Gross profit / assets z-score, trend-confirmed

Novy-Marx 2013, JFE

#39rd_capitalized_value
Accounting

R&D intensity z-score × price below 200d MA

Peters & Taylor 2017, JFE; Eisfeldt-Kim-Papanikolaou 2020

#40max_drawdown_premium
Risk-Premium

Trailing 252d MaxDD threshold + 60d MA recovery

Atilgan, Bali, Demirtas & Gunaydin 2020, JFE

#41speculative_beta
Risk-Premium

Rolling-window β dispersion: short high-β + high dispersion

Hong & Sraer 2016, JF

#42meta_equal_weight
Meta

5-signal 1/N consensus (≥ 3/5 or 4/5 agree)

DeMiguel, Garlappi & Uppal 2009, RFS

#43meta_regime_router
Meta

VIX × SPY routes trend / revert / crisis-short sub-signal

Daniel & Moskowitz 2016, JFE (Momentum Crashes)

#44macro_regime
Macro

FRED term-spread + credit-spread + Fed-funds cycle gate

Estrella-Mishkin 1998 (yield-curve recession signal)

#45attention_spike
Sentiment

Google Trends SVI z-score: short spike, long quiet

Da, Engelberg & Gao 2011, JF

#46lazy_prices
Event

10-K filing year-over-year cosine similarity ≥ 0.85

Cohen, Malloy & Nguyen 2020, JF

#47m_and_a_arb
Event

SEC 8-K Item 1.01/2.01/8.01 post-event drift

Mitchell & Pulvino 2001, JFE

#48crowding_reversal
Meta

13F crowding-z reversal (top-decile fade, bottom-decile rebound)

Wardlaw 2020, JF

#49realized_skew_xs
Risk-Premium

Own-history percentile of rolling realized skewness → long bottom, short top

Amaya, Christoffersen, Jacobs & Vasquez 2015, JFE

#50mag7_factor_overlay
Meta

Mag-7 concentration regime: rotation_into / rotation_away / regime_gate variants

AQR Capital — A New Paradigm in Active Equity, Q1 2025

#51pre_fomc_drift
Macro

Long 1-3 trading days before scheduled FOMC + post-fade combo (press-conference filter)

Lucca & Moench 2015, JF (NY Fed SR-512) — 2024 Appl. Econ. update

#52index_rebalance_drift
Event

S&P 500 add-front-run, add-post-reversal, remove-bounce around effective date

Chen & Singal 2023, Financial Analysts Journal

Source: services/worker/alpha_experiments_runner.py · Updated weekly · See family-by-family signal logic

Ready to test your strategies?

Run your first optimization with full credibility testing. Free to start, no credit card required.

alphactor.ai provides AI-powered stock research tools for informational and educational purposes only. We are not a registered investment advisor. Nothing on this site constitutes financial, investment, or trading advice. Past performance does not guarantee future results.
For informational and educational purposes only. Not financial advice. Learn more