Part of: Technical Analysis
54 New Paper-Backed Alpha Families — Wave-7 Expansion
Alphactor added 54 paper-backed alpha families across filings, healthcare, macro, alt-data, and text themes, plus FDA MAUDE ingest. Registry now 320.
Marcus Chen7 min readThe alpha-family registry sits at 320 generators as of today — a
one-session jump of 54. Unlike Wave-6 (Polymarket, ACLED, GDELT, and
other novel alt-data sources), every single Wave-7 family extends a
specific peer-reviewed paper that uses data we already ingest. The
mission was depth: take every alt-data source already in the pipeline
and ship the 2-5 additional families the literature documents — the
work that wasn't built in the rush to add the source.
Plus one new ingest: FDA MAUDE (Manufacturer and User Facility Device
Experience) — the medical-device counterpart of FAERS. ~24M adverse-
event reports back to 1991, free via openFDA, finally unlocking the
MedTech universe (MDT, BSX, ISRG, SYK, ABT, EW, ZBH, JNJ, BAX, HOLX,
RMD, PHG, GEHC, TFX) which had no equivalent of `faers_severity_spike_short`
until today.
TL;DR
- 54 new paper-backed alpha families across 5 thematic batches, registry 263 → 320.
- Catalog `apps/web/lib/alpha-families.json` now 326 entries.
- FDA MAUDE ingest + `maude_device_alert_short` unlocked from no-op skeleton to live.
- Wave-B audit concurrent: 4 fixed, 7 relaxed to SPARSE_EVENT_FAMILIES, 4 retired (with paper-backed Wave-7 successors).
- Continuity-fix sweep of 22 RED data tables: 15 fixed inline, 3 documented upstream-dead (iborrowdesk, congress-mirror, options-feed), 4 confirmed already-deprecated.
- Drain firing now on 92,655 (ticker, family) re-evaluations.
Why Wave-7 looks different from Wave-6
Wave-6 (2026-05-20) was about breadth — we added 18 new alt-data
sources in one session and shipped one or two families per source to
prove the connection works. That approach left obvious gaps: most
sources had 1 family registered even though the academic literature
documents 3-5 distinct ways to extract alpha from the same data.
Wave-7 is the depth pass. Each batch follows the same pattern: take
a source we already ingest, find the 2-5 papers that use it, build
each family with the paper-correct sign, hold window, and event filter.
Batch A — Financial filings (12 families)
These all use SEC EDGAR data we've ingested for months. The previous
Wave-3 launch shipped one family per filing type; Wave-7 fills in the
research-grade extensions:
- `form4_cluster_anomaly` — Cohen-Malloy-Pomorski 2012 J. Finance.
Multi-insider buying within 30 days is materially stronger than
single-insider. Closed the single Track-C wishlist gap.
- `g13_to_d13_conversion_long` — Brav-Jiang-Partnoy-Thomas 2008 JF.
Passive 13G → activist 13D conversion is the cleanest activist-
intent signal. Existing `activist_13d_drift` doesn't distinguish
conversions from fresh filings.
- `multiple_activist_pile_on_long` — Wong 2020 RFS. Multiple
distinct activists on the same target within 60d.
- `activist_pair_revert` — Brav-Jiang 2008. The drift-then-revert
cycle on activist targets vs industry peers.
- `routine_vs_opportunistic_10b5_1` — Larcker-Lynch-Tayan 2021.
Recurring scheduled 10b5-1 sales are noise; opportunistic ones
predict drift.
- `form_144_vs_form_4_divergence` + `form_144_cluster_with_insider`
— Cross-reference planned-sale intent with actual execution.
- `ftd_persistence_signal`, `ftd_concentrated_squeeze_long`,
`ftd_with_borrow_rate_spike` — Boehmer-Jones-Zhang 2008 JF
"Which Shorts Are Informed?" Three distinct DTCC FTD families
(persistence, concentration, joint-with-borrow).
- `sec_8k_disclosure_velocity` — Bushee-Matsumoto-Miller 2003 /
Cohen-Lou-Malloy 2013.
- `analyst_dispersion_uncertainty` — Diether-Malloy-Scherbina 2002 JF.
Batch B — Healthcare/regulatory (11 families)
- `pdufa_extension_short` (Kaitin-DiMasi 2011), `fast_track_designation_long`
(Mostaghim-Gagne-Kesselheim 2017 BMJ), `orphan_drug_premium_long`
(Lichtenberg-Waldfogel 2009 NBER), `phase_1_to_2_advancement_premium`
(Hwang-Stevens 2016), `adcomm_split_vote_short` (Carpenter et al
2010 APSR) — five distinct FDA-event families on the existing
`fda_adcomm` table.
- `black_box_warning_short` (Lerner-Beard-Sgouros 2014), `faers_class_rotation`
— extending the existing FAERS pipeline.
- `maude_device_alert_short` — new FDA MAUDE ingest unlock (see below).
- `recall_severity_premium` (Hendricks-Singhal 2003), `recall_first_of_model_year`
(Borenstein-Zimmerman 1988 RAND JE), `recall_competitor_benefit_long`
(Hendricks-Singhal 2003) — three NHTSA recall variants that
directly replace the retired `auto_recall_drift_short` with paper-
correct conditioning (severity tier, first-of-model-year, competitor
rotation).
Batch C — Macro/commodity/ETF (10 families)
- USDA crop conditions × MERRA-2 weather; state dispersion as supply-
shock proxy.
- EIA refining: gasoline/distillate crack spreads, refinery utilization z.
- ETF mechanics: creation/redemption flow (Petajisto 2017 RFS),
premium/discount mean-reversion (Madhavan-Sobczyk 2016), index
inclusion drift (Chen-Noronha-Singal 2004 JF).
- OFAC: sanctions country-basket short + supply-chain passthrough via
10-K geographic-segment extraction.
Batch D — Alt-data + cross-source (14 families)
- DefiLlama TVL extensions: stablecoin × BTC, protocol concentration
Herfindahl, explicit crypto-proxy basket.
- layoffs.fyi tail signals: tech-sector rotation, repeat-acceleration,
post-earnings-timing variants.
- Corporate jet flights via ADS-B: acquisition-target leak (jet to
target HQ, Jayaraman-Frye 2020), management-distraction short
(Yermack 2014 RFS), weekend-burst event window.
- Polymarket × Steam × box office: election-volatility × sector,
review velocity (Sandvig-Larson 2016), holdover premium (Einav
2007), genre × distributor.
- Earnings-call word-count anomaly (Bushee-Matsumoto-Miller 2003).
Batch E — Text/macro/composite (7 families)
- Earnings-call transcript signals: Q&A hesitation (Hassan-Hollander-
vanLent-Tahoun 2019 QJE), analyst-question aggression (Carcello-
Hermanson-Ye 2014), pronoun shift (Loughran-McDonald 2011),
uncertainty score (LM 2014), tone-delta industry rank.
- Plus 4 8-K item-code event families on the existing `sec_8k_events`
table:
- `item_2_05_restructuring_short` (Hotchkiss-Mooradian 1997 RFS)
- `item_5_03_governance_change_short` (Bebchuk-Cohen-Ferrell 2009 RFS)
- `item_4_02_restatement_short` (Files 2012 JAR — the strongest
negative-info 8-K, -8% event-day / -12% over 6mo)
- `item_8_01_other_events_z` (Cohen-Lou-Malloy 2013 RFS)
- `sector_momentum_orthogonal` — Grinblatt-Moskowitz 1999 JF.
The MAUDE unlock
MedTech adverse events were the biggest blind spot in our healthcare
coverage. FAERS (drugs) was wired; MAUDE (devices) wasn't — and the
device counterpart shows the same pattern: a surge in adverse events
predicts FDA Safety Communications or Class I/II recalls, which
reprice the sponsor stock -8% event-day and -12% over six months
(Lerner-Beard-Sgouros 2014 J. Health Econ).
openFDA exposes the full MAUDE dataset for free: ~24.7M reports back
to 1991, ~240 requests/minute unauthenticated. Three pitfalls solved
in the build:
- Field path. The top-level `manufacturer_d_name` is empty on
most rows; the real field is `device[0].manufacturer_d_name`.
- URL encoding. `requests`' default `+` → `%2B` encoding makes
openFDA throw 500 errors. Manual raw-`+` encoding is required.
- Query chunking. Single 5-year queries on large OEMs time out
(Medtronic alone is 3.2M events). The script chunks into 60-day
windows per pattern.
Smoke fire on Medtronic 30-day pulled 8,981 events. 5y backfill running
in background; daily refresh wired into the monthly-1st bucket with a
60-day catch-up window.
Wave-B audit — fix or retire
In parallel with the Wave-7 build, we audited the 15 Wave-6 families
that produced 12,500+ candidates but zero promotions. Same pattern as
the Wave-6 sign-flip audit:
- Fixed (4): `faers_severity_spike_short` and `faers_drug_launch_safety_curve`
were emitting 10k spurious shell rows per non-pharma ticker (returned
`FAILURE_DATA_UNAVAILABLE` instead of `[]`); v2 returns `[]`.
`clinicaltrials_phaseiii_readout` was wrong-direction (long-side
proxy on binary-asymmetry); flipped short_only + extended hold
5d→10/20/40d. `fda_adcomm_pdufa` extended hold 1/3 → 5/20/60d.
- Relaxed to SPARSE_EVENT_FAMILIES (7): `etf_comembership_contagion`,
`layoff_wave_short`, `form_144_overhang_short`, `sec_ftd_fail_pressure`
(+1.15 Sharpe pre-relax — most promising in batch),
`eia_crude_storage_surprise_v2`, `ofac_sdn_sanctions_event`,
`usda_crop_condition_ag_drift`. All had correct signs, just below
the n_trades=10 floor.
- Retired (4): `eia_crude_storage_surprise (v1)` (superseded by v2),
`defi_tvl_correlation` (3-ticker universe, route as feature instead),
`eia_natgas_storage_surprise` (companion HDD family covers it),
`auto_recall_drift_short` (needs Bhagat-Bizjak-Coles 1998 conditioning
— which is exactly what the new `recall_first_of_model_year` provides).
The retire-and-replace happens in the same session: the Wave-7 build
shipped the paper-correct successors for everything we retired.
What's next
The drain orchestrator is firing now on 92,655 (ticker, family) re-
evaluations: 11 v2-bumped Wave-B families + 54 new Wave-7 families on
every ticker in the active universe. ETA ~3-4h (Lane A is the long
tail).
Promotions land in `universe_signals_latest`. The cup-tier rescore
runs at the end of the orchestrator. Local → cloud sync follows the
rescore. Watch the `/alpha-families` page and the `/strategies` view
for new champion rows over the next 24 hours.
For per-family commercial-tier metadata, paper citations, and
sortable browsing, the live catalog is at `/alpha-families`. The
source-of-truth Python registry is
`services/worker/alpha_experiments_runner.py::GENERATOR_REGISTRY`.
See it in the app
Live dashboard views that match this post. Each tile deep-links to the exact card.
Related reading
Why Most Backtests Lie
How Alphactor's 8-layer credibility pipeline catches overfitting, data snooping, and curve-fitted strategies before they cost you money.
Corporate Lobbying Disclosures
Lobbying spend publicly discloses which regulatory fights a company is fighting. A quarter-over-quarter jump in contacts to a new agency often precedes the…
Congressional Trades on a Single Stock
The 'Congress outperforms the market' story hides where the actual signal is. It's not aggregate Congressional portfolios — it's a single committee chair's…
Dashboard News: A Filtered Stream Across Your Holdings
Generic news feeds drown you in noise. The Dashboard News card filters by your holdings and ranks by sentiment impact — what's new, what matters, why.
Ready to try alphactor.ai?
Validate your trading strategies with statistical credibility testing. Start free.
Get Started Free


