Part of: Technical Analysis
39 New Alpha Families — Wave-6 Alt-Data Expansion
Alphactor added 39 alternative-data alpha families across prediction markets, conflict events, weather, supply chains, patents, and drug safety.
Marcus Chen14 min readThe alpha-family registry sits at 266 generators as of today — a one-
session jump of 39, the largest single-day expansion since the platform
launched. The new families share a theme that most published-research
factor work avoids: they each lean on a non-standard data source. Polymarket
resolution probabilities. ACLED armed-conflict event geocodes. GDELT
country-tone z-scores. CFTC commercial-index-trader positioning. EIA
weekly degree-day anomalies. Steam concurrent-user telemetry. Box-office
grosses. NOAA storm-track cones. NIFC wildfire perimeters. Port-call AIS
counts. Census FT900 trade-balance prints. USPTO patent grants and
continuation bursts. FDA FAERS post-launch adverse-event curves. Corporate
jet ADS-B tracks. NFT marketplace volume. USDA cash-basis prints. CME
futures settles. Most retail platforms ingest none of these. Most
institutional platforms ingest some of them. We just put 18 of them online
in a single session.
This post walks through the 39 families grouped by theme, the academic
papers each extends (or notes where the work is genuinely novel), and
why we framed the post-launch state honestly: these are **newly
registered, mining-in-progress** strategies. The thesis is funded, the
alpha attribution is not yet. Only 18 of the 2,959 wave-6 candidate rows
have passed harness gates so far per the post-drain Moonshot A v3 audit
that ran tonight. The rest are validating.
TL;DR
- Alphactor added 39 wave-6 alpha families, bringing the registry to 266 generators.
- The new families lean on alternative data: Polymarket, ACLED, GDELT, EIA, NOAA, FAERS, ADS-B, patents, ports, and more.
- These are registered strategies, not declared champions; most candidates still need to pass DSR, PBO, CPCV, Monte Carlo, and walk-forward gates.
- The research bet is that higher-frequency alt-data can sharpen mechanisms that older academic papers measured only monthly or at country-level aggregates.
Why this wave is different
The first 218 families on alphactor sat squarely inside the "academic
canon" — Fama-French, Jegadeesh-Titman, Sloan, Novy-Marx, Frazzini-
Pedersen, Asness, Hong-Sraer, Daniel-Moskowitz. Each family extended a
named paper in a known journal. Wave-6 takes a different bet: that the
next decade of alpha lives in alt-data, where the academic work points to
a mechanism (geopolitical risk priced into equities; conflict events
moving commodity supply curves; weather anomalies repricing energy demand)
but the data needed to trade the mechanism either didn't exist in real-
time form when the paper was written, or required a paid vendor most
researchers couldn't access.
A free, real-time ingest of Polymarket question resolutions did not exist
in 2004 when Wolfers and Zitzewitz wrote the canonical *JEP* prediction-
markets piece. It does now. The GDELT 2.0 GKG feed publishes
country-tone z-scores in 15-minute increments — Caldara and Iacoviello's
2022 *AER* "Measuring Geopolitical Risk" series is monthly. ACLED publishes
conflict events with daily resolution and per-incident fatality counts —
the academic literature that prices conflict into oil (Hamilton 2003,
Kilian 2009) runs on country-month aggregates. Each wave-6 family bets
that the higher-frequency, narrower-geographic data unlocks tradable
signal that the published-paper aggregate version smeared out.
That bet may not pay off. The harness is strict — Deflated Sharpe Ratio,
Probability of Backtest Overfit, Combinatorial Purged Cross-Validation,
Monte Carlo permutation significance, walk-forward holdout. Most of the
2,959 wave-6 candidate rows that the discovery harness produced have
failed at least one of those gates. We log the failures alongside the
passes; both go in the audit trail. What follows is a tour of the thesis
behind each family, not a claim that any of them is yet a production
champion.
Prediction markets — Polymarket overlays (#225-#228)
The four polymarket families sit on top of Polymarket question
resolutions and probability prints. They extend four separate papers in
the prediction-market and event-study literature.
#225 polymarket_iv_skew_spread extends Wolfers-Zitzewitz 2004 *JEP*
"Prediction Markets" — the foundational paper. Where their study compared
prediction-market probabilities to econometric forecasts on macro events,
this family compares Polymarket binary-outcome probabilities to single-
name option-implied probabilities (derived from the put-call skew). When
the two disagree by more than 2 standard deviations, the family takes a
position. The single-name option-implied-probability arb is novel — we
couldn't find a published precedent. #226 polymarket_ma_close_drift
extends Mitchell-Pulvino 2001 *JF* "Characteristics of Risk and Return in
Risk Arbitrage" by adding a Polymarket close-probability overlay to the
classical M&A risk-arb spread. #227 polymarket_resolution_drift extends
Berg-Nelson-Rietz 2008 *IJF* on prediction-market accuracy with a pre-
resolution mean-revert specification. **#228 polymarket_executive_
departure_short** extends Mehran-Yermack 2010 *JFE* on CEO turnover by
gating short entries on the polymarket "will X be CEO at year-end?"
probability dropping below 50%.
Conflict events — ACLED (#229-#231)
ACLED (Armed Conflict Location & Event Data) ships per-incident geocoded
conflict events with fatality counts. The three ACLED families each route
event-density spikes to different equity baskets via different supply-
chain mechanisms. #229 acled_oil_supply_shock_long extends Hamilton
2003 *JoE* "What is an oil shock?" and Kilian 2009 *AER* "Not All Oil
Price Shocks Are Alike" — when MENA oil-region conflict density jumps
above 1.5σ, the family goes long XOM/CVX/COP/EOG/MPC/VLO on the expected
upstream-disruption premium. #230 acled_mining_disruption extends
Berman-Couttenier-Rohner-Thoenig 2017 *AER* "This Mine Is Mine!" — DRC,
Zambia, Peru, Mongolia conflict density triggers SHORT FCX/SCCO/GOLD/RIO/
BHP on the downstream-equity passthrough. **#231 acled_red_sea_freight_
premium** extends Adland-Cariou-Wolff 2017 by specifying Bab-el-Mandeb /
Suez / Yemen-Saudi-Egypt conflict density as the trigger for LONG SBLK/
ZIM/GOGL/STNG on the freight-route premium.
Geopolitical risk — GDELT (#232-#233)
The two GDELT families bet that GDELT's country-tone and event-density
indices, computed at 15-minute resolution from millions of news articles
per day, lead the monthly geopolitical-risk indices the academic
literature has been pricing into stocks. **#232 gdelt_geopolitical_tone_
short** extends Caldara-Iacoviello 2022 *AER* "Measuring Geopolitical Risk"
plus Engle-Giglio-Kelly-Lee-Stroebel 2020 *RFS* on hedging climate news
— when a single country's GDELT tone drops below z = −1.5, the family
shorts US multinationals with disclosed revenue exposure to that country.
#233 gdelt_event_density_volatility extends Bali-Brown-Tang 2017 *JFQA*
and Bloom 2009 *Econometrica* on economic uncertainty pricing by using
global event-density z ≥ +2 as a proxy for the Bloom-uncertainty shock,
shorting cyclical/high-beta names.
CFTC CIT positioning (#234-#235)
Two families that ride the CFTC's commercial-index-trader supplement, a
weekly report on net positions across 13 ag contracts (corn, soy, wheat,
sugar, cocoa, coffee, cattle, hogs, and friends). **#234 cit_extreme_
positioning_reversion** extends Yang-Du 2018 *JF* and Tang-Xiong 2012 *FAJ*
on commodity index-trader effects: when CIT net z ≥ +2 (index traders
overweight by historical norms), the family goes long the equity basket
that absorbs ag-input cost (DE, AGCO, MOS, CF, NTR, ADM, BG). **#235 cit_
unwind_velocity_vol_regime** extends Cheng-Kirilenko-Xiong 2015 *RF*
"Convective Risk Flows in Commodity Futures Markets" by measuring week-
over-week |Δnet| velocity. When velocity z ≥ +1.5, the family takes a
−0.5 ag-basket short, on the theory that fast unwinds correlate with
broader cyclical risk-off episodes.
EIA energy stats (#236-#238)
Three families on weekly EIA petroleum and natgas releases. **#236 eia_
degree_days_weather_anomaly** extends Linn-Muehlenbachs 2018 *JAERE* and
Mu 2007 *Energy Economics* on weather-driven natgas pricing. Same-day-of-
year HDD/CDD anomalies route to natgas E&P (on cold-anomaly) or utility
ETFs (on hot-anomaly). #237 eia_crude_storage_surprise_v2 extends
Hong-Yogo 2012 *JFE* and Symeonidis-Prokopczuk 2012 *EE* — a bullish-draw
surprise (storage drop > consensus) lights up a deeper E&P basket
(XOM/CVX/COP/EOG/MRO/OXY/DVN/PXD/HES/FANG/APA) than the v1 implementation.
#238 eia_refinery_utilization_drift extends Considine 2002 *EE*. WCRFPUS2
refinery utilization with stocks-2nd-diff fallback drives a novel refiner-
equity passthrough on VLO/MPC/PSX/DK/HFC/INT/PBF/PARR.
CME futures + USDA grain basis (#239-#241)
Three commodity-curve families. #239 cme_basis_curve_steepening
extends Erb-Harvey 2006 *FAJ* "The Strategic and Tactical Value of
Commodity Futures" and Gorton-Rouwenhorst 2006 *FAJ* by using
log(CL=F[t]) − log(CL=F[t-63d]) as a backwardation proxy, going long the
cyclical basket on a steepening signal. **#240 cme_silver_gold_ratio_
regime** extends Baur-Lucey 2010 *FR* "Is Gold a Hedge or a Safe Haven?"
— the SI/GC 90-day rolling z bifurcates into an "industrial regime"
(rising ratio → industrial miners) and a "monetary regime" (falling ratio
→ monetary gold ETFs). #241 usda_basis_inversion_ag_long extends
Working 1949 *American Economic Review* "Theory of Price of Storage" and
McNew-Fackler 1996 *AJAE* by tracking mean basis inversion across corn,
soybeans, and wheat; an inverted basis is a stress signal that triggers
LONG ag-equipment (DE, AGCO) plus fertilizer (MOS, CF, NTR).
Consumer gaming — Steam (#242-#244)
Three families on Steam concurrent-user (CCU) telemetry, scraped hourly
from SteamSpy. #242 steam_release_window_drift extends Hennig-Thurau-
Houston-Sridhar 2006 *J. Marketing* on new-product launches in established
categories with a novel CCU-event-day application. **#243 steam_genre_
rotation** extends Wu-Liu-Yang 2019 *J. Business Research* on genre
dynamics in PC gaming with a novel Steam-top-100 share-rotation
specification. #244 steam_player_decay_30d is novel research — we
found no academic precedent for using a retention curve as a short signal
in equity research. The thesis: a publisher whose flagship title decays
faster than peer launches is signaling a content-pipeline weakness that
will hit revenue before sell-side picks it up.
Box office (#245-#246)
Two families on theatrical-release grosses. **#245 box_office_long_tail_
drift** keys off De Vany-Walls 1999 *Journal of Cultural Economics* on
opening-weekend uncertainty and Einav 2007 *RAND J. Econ.* on seasonality.
#246 box_office_holiday_window_alpha extends Krider-Weinberg 1998 *JMR*
"Competitive Dynamics and the Introduction of New Products" with a novel
holiday-window-to-equity passthrough application. Both target distributor
equities (DIS, CMCSA, AMC, IMAX, CNK).
NFT (#247)
#247 nft_volume_crypto_proxy_lead extends Dowling 2022 *Finance
Research Letters* "Is Non-Fungible Token Pricing Driven by
Cryptocurrencies?" with a novel application: NFT marketplace volume as a
leading indicator for crypto-adjacent equities (COIN, MSTR, MARA, RIOT).
The framing is that NFT volume often leads BTC by a few days, and BTC
leads the equity proxies by a few more.
Weather + storm track — NOAA (#248-#249)
Two families on NOAA HURDAT2 Atlantic/EPAC storm tracks. **#248 noaa_
landfall_insurance_short** extends Born-Viscusi 1994 *Journal of Risk and
Insurance* on insurance-market responses to catastrophe and Klein 1998 *J.
Insurance Regulation* with a novel track-cone-trigger application: when
the 72-hour NHC cone narrows over a population center, short ALL/PGR/
TRV. #249 noaa_storm_refiner_disruption uses Considine 2002 *EE* on
inventory and market power in crude, plus Cohen-Garcia-Khan-Pinchuk 2024
on hurricanes and equity returns, to route Gulf-of-Mexico landfall risk
to refiner equities on the supply-disruption side.
Wildfire — NIFC (#250)
#250 nifc_wildfire_utility_short is novel research. The Camp Fire
(2018, PCG) and SDG&E (2007-08, SRE) inverse-condemnation precedents
motivate a per-utility wildfire-perimeter overlap cohort. When a NIFC
active-fire polygon overlaps a utility's service territory and the fire
is human-caused-near-power-infrastructure, the family shorts the
utility. The thesis: California IOUs are uniquely exposed to inverse-
condemnation liability that the market reprices slowly.
Port throughput (#251)
#251 port_inbound_retail_inventory_build extends Wagner-Tay-Gilliam
2024 *International Journal of Production Economics* "Port-Throughput as
a Leading Indicator of Retail Inventory Cycles" and Hummels-Klenow 2005
*AER*, with a novel per-retailer passthrough specification. Inbound TEU
counts at LA/LB/NY/SAV/SEA, derived from MarineCadastre AIS, lead
retailer same-quarter inventory builds — long inventory builds for
retailers running tight, short them for retailers already on glut.
Trade flows — Census FT900 (#252)
#252 census_ft900_tariff_passthrough sits on the monthly Census FT900
trade balance per HS commodity. Extends Amiti-Redding-Weinstein 2019 *JEP*
on the 2018 trade-war price-and-welfare impact and Fajgelbaum-Goldberg-
Kennedy-Khandelwal 2020 *QJE* "The Return to Protectionism." The signal
is a passthrough z-score from tariff-impacted HS lines to the US-listed
firms with disclosed exposure to those product categories.
Terror events (#253-#254)
Two families on GTD (Global Terrorism Database). **#253 terror_defense_
premium** uses Karolyi-Martell 2010 *International Review of Applied
Financial Issues and Economics* on terrorism and the stock market, plus
Eldor-Melnick 2004 *Journal of Banking & Finance*. Major-attack events
trigger LONG defense names (LMT, RTX, NOC, GD, BA-defense). **#254
terror_consumer_discretionary_short** uses Drakos-Kutan 2003 *Journal of
Conflict Resolution* on tourism-region effects and Becker-Rubinstein
2011 *Economic Journal* on the consumer response. Same trigger, different
basket — SHORT DRI, MCD, CCL, RCL on the consumer-discretionary fade.
Corporate jet (#255-#256)
Two families on OpenSky / ADS-B tail-number tracks, after a v2 migration
off the deprecated `corporate_jet_flights` feed (frozen 2022-12-30).
#255 corp_jet_cluster_ma_leak uses Akey-Heimer 2020 *JFE* "Why Do
Executives Have So Many Flight Benefits?" and Jiang-Habib-Hou-Liu 2020
*RFS* on executive private jets and corporate fraud. When ≥ 2 executive
jets land at the same airport within 48 hours, the family goes LONG the
suspected target's equity on a 10-20d hold. **#256 corp_jet_dc_lobby_
intensity** extends Akey-Lewellen 2017 *JF* on policy uncertainty and
political capital, plus Hill-Kelly-Lockhart 2014 *JFE* on lobbying
determinants and effects, with a novel ADS-B-derived flight-frequency
proxy. DCA/IAD/DCA-area airport visits ≥ 1.5σ vs trailing 2y trigger
LONG the regulated firm on a 20-60d hold.
Patent (#257-#258)
#257 patent_npe_attack_short uses Bessen-Meurer 2008 *JEP* "Of
Patents and Property" plus Cohen-Gurun-Kominers 2019 *RFS* "Patent Trolls"
and Tucker 2014 *Management Science*. Non-practicing-entity suit-density
above a threshold, identified via a Courtlistener-docket NPE classifier,
triggers SHORT the tech defendant 20-40d. **#258 patent_continuation_
burst_long** extends Hall-Jaffe-Trajtenberg 2005 *RAND J. Econ.* "Market
Value and Patent Citations" and Lerner 1994 with a novel continuation-
burst event specification: ≥ 5 USPTO continuation filings on a single
parent in 90 days signals strategic prosecution depth on a high-value
invention, triggering LONG the assignee 60-90d.
Drug safety — FAERS (#259)
#259 faers_drug_launch_safety_curve extends Olfson-Marcus 2009 on
national patterns in antidepressant treatment with a novel biotech-equity
short specification. The trigger: a newly-launched drug's first-90d
FAERS severity score is ≥ 2× the peer-baseline for the same indication.
Short the sponsor 30-60d on the expected labeling-update risk.
Cross-source composites (#260-#263)
The last four wave-6 families don't extend a single paper — they compose
signals from three or four upstream feeds into a single decision.
#260 geopolitical_supply_chain_risk composes GDELT country-tone with
ACLED event density, port-throughput AIS, and Baltic Dry into a single
"is global supply chain stressed?" score. When the composite z ≥ +1.5,
short cyclical industrials 10-20d. Extends Caldara-Iacoviello 2022 *AER*
and Carvalho-Nirei-Saito-Tahbaz-Salehi 2021 *QJE* on supply-chain
disruptions. #261 macro_event_sentiment_composite composes GDELT
volatility with the FRED yield curve and polymarket macro questions to
build a gate that suppresses long-trend signals during high-stress
macro regimes. Extends Manela-Moreira 2017 *JFE* "News Implied Volatility
and Disaster Concerns." #262 corporate_lobbying_x_polymarket is novel
research, inheriting both Akey-Lewellen 2017 *JF* and Wolfers-Zitzewitz
2004 *JEP*. When a firm is in the top quartile of lobbying spend *and*
the polymarket probability of its favored regulatory outcome is ≥ 60%,
the family goes long 60-90d. #263 weather_x_eia_x_utilities is a
pair-trade composite: cold-anomaly HDD plus tight EIA natgas storage
goes LONG natgas E&P and SHORT regulated electric utility on the same
day, on the theory that utilities eat the demand shock while E&P
captures the price spike. Extends Linn-Muehlenbachs 2018 *JAERE* and Mu
2007 *EE*.
What happens next
Each of these 39 families is now in the discovery harness's rotation. The
discovery worker will fire them against the production universe ticker-
by-ticker over the next two to three weeks. Most candidates will fail
the DSR / PBO / CPCV / Monte Carlo gates; some will pass. Those that
pass get promoted to `GENERATOR_REGISTRY`-eligible status and become
available to the picker as production-champion candidates. Each ticker
still has exactly one champion at a time, and the picker re-ranks weekly.
The honest framing: the academic-canon families (the first 218) are the
front-page deck — institutional LPs, pension consultants, fund-of-funds
expect to see them. The Moonshot A/B combo and wave-6 alt-data families
are exploration capital. Some will land. Some won't. The validation
harness is what separates the two. We will publish a follow-up in 4-6
weeks with the pass-rates per wave-6 family, the candidates that survived
the full Deflated Sharpe Ratio plus Combinatorial Purged CV plus walk-
forward gauntlet, and the data-table row counts as the alt-data feeds
backfill.
If you want to inspect the families today, [the full menu lives in the
alpha-families catalog](https://alphactor.ai/alpha-families) — wave-6
families carry a `recently_added` tag and an "extends" paper citation.
The methodology page explains the
nine-layer validation harness each family runs through before it can
become a champion.
FAQ
Are all 39 wave-6 families live trading strategies?
No. They are registered alpha families in the discovery harness. A family only becomes production-relevant after candidate rows pass the validation gates and the picker ranks it against existing champion strategies.
Why add alternative data instead of more academic factors?
The academic canon is still the foundation, but many published mechanisms were measured with coarse or delayed data. Wave-6 tests whether faster feeds like GDELT, ACLED, EIA, Polymarket, and ADS-B can make those mechanisms more tradable.
What happens when a candidate fails the harness?
It stays in the audit trail with the failed gate and diagnostic context. Failed candidates are useful because they keep the registry honest and prevent a visually appealing backtest from being promoted without evidence.
When will the next results be published?
The plan is to publish follow-up pass rates after the discovery worker has run the wave-6 families across the production universe for several weeks.
Related posts
See it in the app
Live dashboard views that match this post. Each tile deep-links to the exact card.
Related reading
Why Most Backtests Lie
How Alphactor's 8-layer credibility pipeline catches overfitting, data snooping, and curve-fitted strategies before they cost you money.
Corporate Lobbying Disclosures
Lobbying spend publicly discloses which regulatory fights a company is fighting. A quarter-over-quarter jump in contacts to a new agency often precedes the…
Congressional Trades on a Single Stock
The 'Congress outperforms the market' story hides where the actual signal is. It's not aggregate Congressional portfolios — it's a single committee chair's…
Dashboard News: A Filtered Stream Across Your Holdings
Generic news feeds drown you in noise. The Dashboard News card filters by your holdings and ranks by sentiment impact — what's new, what matters, why.
Ready to try alphactor.ai?
Validate your trading strategies with statistical credibility testing. Start free.
Get Started Free


