Part of: Technical Analysis

54 New Paper-Backed Alpha Families — Wave-7 Expansion

Alphactor added 54 paper-backed alpha families across filings, healthcare, macro, alt-data, and text themes, plus FDA MAUDE ingest. Registry now 320.

Marcus ChenMarcus Chen7 min read

The alpha-family registry sits at 320 generators as of today — a

one-session jump of 54. Unlike Wave-6 (Polymarket, ACLED, GDELT, and

other novel alt-data sources), every single Wave-7 family extends a

specific peer-reviewed paper that uses data we already ingest. The

mission was depth: take every alt-data source already in the pipeline

and ship the 2-5 additional families the literature documents — the

work that wasn't built in the rush to add the source.

Plus one new ingest: FDA MAUDE (Manufacturer and User Facility Device

Experience) — the medical-device counterpart of FAERS. ~24M adverse-

event reports back to 1991, free via openFDA, finally unlocking the

MedTech universe (MDT, BSX, ISRG, SYK, ABT, EW, ZBH, JNJ, BAX, HOLX,

RMD, PHG, GEHC, TFX) which had no equivalent of `faers_severity_spike_short`

until today.

TL;DR

  • 54 new paper-backed alpha families across 5 thematic batches, registry 263 → 320.
  • Catalog `apps/web/lib/alpha-families.json` now 326 entries.
  • FDA MAUDE ingest + `maude_device_alert_short` unlocked from no-op skeleton to live.
  • Wave-B audit concurrent: 4 fixed, 7 relaxed to SPARSE_EVENT_FAMILIES, 4 retired (with paper-backed Wave-7 successors).
  • Continuity-fix sweep of 22 RED data tables: 15 fixed inline, 3 documented upstream-dead (iborrowdesk, congress-mirror, options-feed), 4 confirmed already-deprecated.
  • Drain firing now on 92,655 (ticker, family) re-evaluations.

Why Wave-7 looks different from Wave-6

Wave-6 (2026-05-20) was about breadth — we added 18 new alt-data

sources in one session and shipped one or two families per source to

prove the connection works. That approach left obvious gaps: most

sources had 1 family registered even though the academic literature

documents 3-5 distinct ways to extract alpha from the same data.

Wave-7 is the depth pass. Each batch follows the same pattern: take

a source we already ingest, find the 2-5 papers that use it, build

each family with the paper-correct sign, hold window, and event filter.

Batch A — Financial filings (12 families)

These all use SEC EDGAR data we've ingested for months. The previous

Wave-3 launch shipped one family per filing type; Wave-7 fills in the

research-grade extensions:

  • `form4_cluster_anomaly` — Cohen-Malloy-Pomorski 2012 J. Finance.

Multi-insider buying within 30 days is materially stronger than

single-insider. Closed the single Track-C wishlist gap.

  • `g13_to_d13_conversion_long` — Brav-Jiang-Partnoy-Thomas 2008 JF.

Passive 13G → activist 13D conversion is the cleanest activist-

intent signal. Existing `activist_13d_drift` doesn't distinguish

conversions from fresh filings.

  • `multiple_activist_pile_on_long` — Wong 2020 RFS. Multiple

distinct activists on the same target within 60d.

  • `activist_pair_revert` — Brav-Jiang 2008. The drift-then-revert

cycle on activist targets vs industry peers.

  • `routine_vs_opportunistic_10b5_1` — Larcker-Lynch-Tayan 2021.

Recurring scheduled 10b5-1 sales are noise; opportunistic ones

predict drift.

  • `form_144_vs_form_4_divergence` + `form_144_cluster_with_insider`

— Cross-reference planned-sale intent with actual execution.

  • `ftd_persistence_signal`, `ftd_concentrated_squeeze_long`,

`ftd_with_borrow_rate_spike` — Boehmer-Jones-Zhang 2008 JF

"Which Shorts Are Informed?" Three distinct DTCC FTD families

(persistence, concentration, joint-with-borrow).

  • `sec_8k_disclosure_velocity` — Bushee-Matsumoto-Miller 2003 /

Cohen-Lou-Malloy 2013.

  • `analyst_dispersion_uncertainty` — Diether-Malloy-Scherbina 2002 JF.

Batch B — Healthcare/regulatory (11 families)

  • `pdufa_extension_short` (Kaitin-DiMasi 2011), `fast_track_designation_long`

(Mostaghim-Gagne-Kesselheim 2017 BMJ), `orphan_drug_premium_long`

(Lichtenberg-Waldfogel 2009 NBER), `phase_1_to_2_advancement_premium`

(Hwang-Stevens 2016), `adcomm_split_vote_short` (Carpenter et al

2010 APSR) — five distinct FDA-event families on the existing

`fda_adcomm` table.

  • `black_box_warning_short` (Lerner-Beard-Sgouros 2014), `faers_class_rotation`

— extending the existing FAERS pipeline.

  • `maude_device_alert_short` — new FDA MAUDE ingest unlock (see below).
  • `recall_severity_premium` (Hendricks-Singhal 2003), `recall_first_of_model_year`

(Borenstein-Zimmerman 1988 RAND JE), `recall_competitor_benefit_long`

(Hendricks-Singhal 2003) — three NHTSA recall variants that

directly replace the retired `auto_recall_drift_short` with paper-

correct conditioning (severity tier, first-of-model-year, competitor

rotation).

Batch C — Macro/commodity/ETF (10 families)

  • USDA crop conditions × MERRA-2 weather; state dispersion as supply-

shock proxy.

  • EIA refining: gasoline/distillate crack spreads, refinery utilization z.
  • ETF mechanics: creation/redemption flow (Petajisto 2017 RFS),

premium/discount mean-reversion (Madhavan-Sobczyk 2016), index

inclusion drift (Chen-Noronha-Singal 2004 JF).

  • OFAC: sanctions country-basket short + supply-chain passthrough via

10-K geographic-segment extraction.

Batch D — Alt-data + cross-source (14 families)

  • DefiLlama TVL extensions: stablecoin × BTC, protocol concentration

Herfindahl, explicit crypto-proxy basket.

  • layoffs.fyi tail signals: tech-sector rotation, repeat-acceleration,

post-earnings-timing variants.

  • Corporate jet flights via ADS-B: acquisition-target leak (jet to

target HQ, Jayaraman-Frye 2020), management-distraction short

(Yermack 2014 RFS), weekend-burst event window.

  • Polymarket × Steam × box office: election-volatility × sector,

review velocity (Sandvig-Larson 2016), holdover premium (Einav

2007), genre × distributor.

  • Earnings-call word-count anomaly (Bushee-Matsumoto-Miller 2003).

Batch E — Text/macro/composite (7 families)

  • Earnings-call transcript signals: Q&A hesitation (Hassan-Hollander-

vanLent-Tahoun 2019 QJE), analyst-question aggression (Carcello-

Hermanson-Ye 2014), pronoun shift (Loughran-McDonald 2011),

uncertainty score (LM 2014), tone-delta industry rank.

  • Plus 4 8-K item-code event families on the existing `sec_8k_events`

table:

- `item_2_05_restructuring_short` (Hotchkiss-Mooradian 1997 RFS)

- `item_5_03_governance_change_short` (Bebchuk-Cohen-Ferrell 2009 RFS)

- `item_4_02_restatement_short` (Files 2012 JAR — the strongest

negative-info 8-K, -8% event-day / -12% over 6mo)

- `item_8_01_other_events_z` (Cohen-Lou-Malloy 2013 RFS)

  • `sector_momentum_orthogonal` — Grinblatt-Moskowitz 1999 JF.

The MAUDE unlock

MedTech adverse events were the biggest blind spot in our healthcare

coverage. FAERS (drugs) was wired; MAUDE (devices) wasn't — and the

device counterpart shows the same pattern: a surge in adverse events

predicts FDA Safety Communications or Class I/II recalls, which

reprice the sponsor stock -8% event-day and -12% over six months

(Lerner-Beard-Sgouros 2014 J. Health Econ).

openFDA exposes the full MAUDE dataset for free: ~24.7M reports back

to 1991, ~240 requests/minute unauthenticated. Three pitfalls solved

in the build:

  1. Field path. The top-level `manufacturer_d_name` is empty on

most rows; the real field is `device[0].manufacturer_d_name`.

  1. URL encoding. `requests`' default `+` → `%2B` encoding makes

openFDA throw 500 errors. Manual raw-`+` encoding is required.

  1. Query chunking. Single 5-year queries on large OEMs time out

(Medtronic alone is 3.2M events). The script chunks into 60-day

windows per pattern.

Smoke fire on Medtronic 30-day pulled 8,981 events. 5y backfill running

in background; daily refresh wired into the monthly-1st bucket with a

60-day catch-up window.

Wave-B audit — fix or retire

In parallel with the Wave-7 build, we audited the 15 Wave-6 families

that produced 12,500+ candidates but zero promotions. Same pattern as

the Wave-6 sign-flip audit:

  • Fixed (4): `faers_severity_spike_short` and `faers_drug_launch_safety_curve`

were emitting 10k spurious shell rows per non-pharma ticker (returned

`FAILURE_DATA_UNAVAILABLE` instead of `[]`); v2 returns `[]`.

`clinicaltrials_phaseiii_readout` was wrong-direction (long-side

proxy on binary-asymmetry); flipped short_only + extended hold

5d→10/20/40d. `fda_adcomm_pdufa` extended hold 1/3 → 5/20/60d.

  • Relaxed to SPARSE_EVENT_FAMILIES (7): `etf_comembership_contagion`,

`layoff_wave_short`, `form_144_overhang_short`, `sec_ftd_fail_pressure`

(+1.15 Sharpe pre-relax — most promising in batch),

`eia_crude_storage_surprise_v2`, `ofac_sdn_sanctions_event`,

`usda_crop_condition_ag_drift`. All had correct signs, just below

the n_trades=10 floor.

  • Retired (4): `eia_crude_storage_surprise (v1)` (superseded by v2),

`defi_tvl_correlation` (3-ticker universe, route as feature instead),

`eia_natgas_storage_surprise` (companion HDD family covers it),

`auto_recall_drift_short` (needs Bhagat-Bizjak-Coles 1998 conditioning

— which is exactly what the new `recall_first_of_model_year` provides).

The retire-and-replace happens in the same session: the Wave-7 build

shipped the paper-correct successors for everything we retired.

What's next

The drain orchestrator is firing now on 92,655 (ticker, family) re-

evaluations: 11 v2-bumped Wave-B families + 54 new Wave-7 families on

every ticker in the active universe. ETA ~3-4h (Lane A is the long

tail).

Promotions land in `universe_signals_latest`. The cup-tier rescore

runs at the end of the orchestrator. Local → cloud sync follows the

rescore. Watch the `/alpha-families` page and the `/strategies` view

for new champion rows over the next 24 hours.

For per-family commercial-tier metadata, paper citations, and

sortable browsing, the live catalog is at `/alpha-families`. The

source-of-truth Python registry is

`services/worker/alpha_experiments_runner.py::GENERATOR_REGISTRY`.

See it in the app

Live dashboard views that match this post. Each tile deep-links to the exact card.

Related reading

Ready to try alphactor.ai?

Validate your trading strategies with statistical credibility testing. Start free.

Get Started Free
For informational and educational purposes only. Not financial advice. Learn more