Part of: Technical Analysis

39 New Alpha Families — Wave-6 Alt-Data Expansion

Alphactor added 39 alternative-data alpha families across prediction markets, conflict events, weather, supply chains, patents, and drug safety.

Marcus ChenMarcus Chen14 min read

The alpha-family registry sits at 266 generators as of today — a one-

session jump of 39, the largest single-day expansion since the platform

launched. The new families share a theme that most published-research

factor work avoids: they each lean on a non-standard data source. Polymarket

resolution probabilities. ACLED armed-conflict event geocodes. GDELT

country-tone z-scores. CFTC commercial-index-trader positioning. EIA

weekly degree-day anomalies. Steam concurrent-user telemetry. Box-office

grosses. NOAA storm-track cones. NIFC wildfire perimeters. Port-call AIS

counts. Census FT900 trade-balance prints. USPTO patent grants and

continuation bursts. FDA FAERS post-launch adverse-event curves. Corporate

jet ADS-B tracks. NFT marketplace volume. USDA cash-basis prints. CME

futures settles. Most retail platforms ingest none of these. Most

institutional platforms ingest some of them. We just put 18 of them online

in a single session.

This post walks through the 39 families grouped by theme, the academic

papers each extends (or notes where the work is genuinely novel), and

why we framed the post-launch state honestly: these are **newly

registered, mining-in-progress** strategies. The thesis is funded, the

alpha attribution is not yet. Only 18 of the 2,959 wave-6 candidate rows

have passed harness gates so far per the post-drain Moonshot A v3 audit

that ran tonight. The rest are validating.

TL;DR

  • Alphactor added 39 wave-6 alpha families, bringing the registry to 266 generators.
  • The new families lean on alternative data: Polymarket, ACLED, GDELT, EIA, NOAA, FAERS, ADS-B, patents, ports, and more.
  • These are registered strategies, not declared champions; most candidates still need to pass DSR, PBO, CPCV, Monte Carlo, and walk-forward gates.
  • The research bet is that higher-frequency alt-data can sharpen mechanisms that older academic papers measured only monthly or at country-level aggregates.

Why this wave is different

The first 218 families on alphactor sat squarely inside the "academic

canon" — Fama-French, Jegadeesh-Titman, Sloan, Novy-Marx, Frazzini-

Pedersen, Asness, Hong-Sraer, Daniel-Moskowitz. Each family extended a

named paper in a known journal. Wave-6 takes a different bet: that the

next decade of alpha lives in alt-data, where the academic work points to

a mechanism (geopolitical risk priced into equities; conflict events

moving commodity supply curves; weather anomalies repricing energy demand)

but the data needed to trade the mechanism either didn't exist in real-

time form when the paper was written, or required a paid vendor most

researchers couldn't access.

A free, real-time ingest of Polymarket question resolutions did not exist

in 2004 when Wolfers and Zitzewitz wrote the canonical *JEP* prediction-

markets piece. It does now. The GDELT 2.0 GKG feed publishes

country-tone z-scores in 15-minute increments — Caldara and Iacoviello's

2022 *AER* "Measuring Geopolitical Risk" series is monthly. ACLED publishes

conflict events with daily resolution and per-incident fatality counts —

the academic literature that prices conflict into oil (Hamilton 2003,

Kilian 2009) runs on country-month aggregates. Each wave-6 family bets

that the higher-frequency, narrower-geographic data unlocks tradable

signal that the published-paper aggregate version smeared out.

That bet may not pay off. The harness is strict — Deflated Sharpe Ratio,

Probability of Backtest Overfit, Combinatorial Purged Cross-Validation,

Monte Carlo permutation significance, walk-forward holdout. Most of the

2,959 wave-6 candidate rows that the discovery harness produced have

failed at least one of those gates. We log the failures alongside the

passes; both go in the audit trail. What follows is a tour of the thesis

behind each family, not a claim that any of them is yet a production

champion.

Prediction markets — Polymarket overlays (#225-#228)

The four polymarket families sit on top of Polymarket question

resolutions and probability prints. They extend four separate papers in

the prediction-market and event-study literature.

#225 polymarket_iv_skew_spread extends Wolfers-Zitzewitz 2004 *JEP*

"Prediction Markets" — the foundational paper. Where their study compared

prediction-market probabilities to econometric forecasts on macro events,

this family compares Polymarket binary-outcome probabilities to single-

name option-implied probabilities (derived from the put-call skew). When

the two disagree by more than 2 standard deviations, the family takes a

position. The single-name option-implied-probability arb is novel — we

couldn't find a published precedent. #226 polymarket_ma_close_drift

extends Mitchell-Pulvino 2001 *JF* "Characteristics of Risk and Return in

Risk Arbitrage" by adding a Polymarket close-probability overlay to the

classical M&A risk-arb spread. #227 polymarket_resolution_drift extends

Berg-Nelson-Rietz 2008 *IJF* on prediction-market accuracy with a pre-

resolution mean-revert specification. **#228 polymarket_executive_

departure_short** extends Mehran-Yermack 2010 *JFE* on CEO turnover by

gating short entries on the polymarket "will X be CEO at year-end?"

probability dropping below 50%.

Conflict events — ACLED (#229-#231)

ACLED (Armed Conflict Location & Event Data) ships per-incident geocoded

conflict events with fatality counts. The three ACLED families each route

event-density spikes to different equity baskets via different supply-

chain mechanisms. #229 acled_oil_supply_shock_long extends Hamilton

2003 *JoE* "What is an oil shock?" and Kilian 2009 *AER* "Not All Oil

Price Shocks Are Alike" — when MENA oil-region conflict density jumps

above 1.5σ, the family goes long XOM/CVX/COP/EOG/MPC/VLO on the expected

upstream-disruption premium. #230 acled_mining_disruption extends

Berman-Couttenier-Rohner-Thoenig 2017 *AER* "This Mine Is Mine!" — DRC,

Zambia, Peru, Mongolia conflict density triggers SHORT FCX/SCCO/GOLD/RIO/

BHP on the downstream-equity passthrough. **#231 acled_red_sea_freight_

premium** extends Adland-Cariou-Wolff 2017 by specifying Bab-el-Mandeb /

Suez / Yemen-Saudi-Egypt conflict density as the trigger for LONG SBLK/

ZIM/GOGL/STNG on the freight-route premium.

Geopolitical risk — GDELT (#232-#233)

The two GDELT families bet that GDELT's country-tone and event-density

indices, computed at 15-minute resolution from millions of news articles

per day, lead the monthly geopolitical-risk indices the academic

literature has been pricing into stocks. **#232 gdelt_geopolitical_tone_

short** extends Caldara-Iacoviello 2022 *AER* "Measuring Geopolitical Risk"

plus Engle-Giglio-Kelly-Lee-Stroebel 2020 *RFS* on hedging climate news

— when a single country's GDELT tone drops below z = −1.5, the family

shorts US multinationals with disclosed revenue exposure to that country.

#233 gdelt_event_density_volatility extends Bali-Brown-Tang 2017 *JFQA*

and Bloom 2009 *Econometrica* on economic uncertainty pricing by using

global event-density z ≥ +2 as a proxy for the Bloom-uncertainty shock,

shorting cyclical/high-beta names.

CFTC CIT positioning (#234-#235)

Two families that ride the CFTC's commercial-index-trader supplement, a

weekly report on net positions across 13 ag contracts (corn, soy, wheat,

sugar, cocoa, coffee, cattle, hogs, and friends). **#234 cit_extreme_

positioning_reversion** extends Yang-Du 2018 *JF* and Tang-Xiong 2012 *FAJ*

on commodity index-trader effects: when CIT net z ≥ +2 (index traders

overweight by historical norms), the family goes long the equity basket

that absorbs ag-input cost (DE, AGCO, MOS, CF, NTR, ADM, BG). **#235 cit_

unwind_velocity_vol_regime** extends Cheng-Kirilenko-Xiong 2015 *RF*

"Convective Risk Flows in Commodity Futures Markets" by measuring week-

over-week |Δnet| velocity. When velocity z ≥ +1.5, the family takes a

−0.5 ag-basket short, on the theory that fast unwinds correlate with

broader cyclical risk-off episodes.

EIA energy stats (#236-#238)

Three families on weekly EIA petroleum and natgas releases. **#236 eia_

degree_days_weather_anomaly** extends Linn-Muehlenbachs 2018 *JAERE* and

Mu 2007 *Energy Economics* on weather-driven natgas pricing. Same-day-of-

year HDD/CDD anomalies route to natgas E&P (on cold-anomaly) or utility

ETFs (on hot-anomaly). #237 eia_crude_storage_surprise_v2 extends

Hong-Yogo 2012 *JFE* and Symeonidis-Prokopczuk 2012 *EE* — a bullish-draw

surprise (storage drop > consensus) lights up a deeper E&P basket

(XOM/CVX/COP/EOG/MRO/OXY/DVN/PXD/HES/FANG/APA) than the v1 implementation.

#238 eia_refinery_utilization_drift extends Considine 2002 *EE*. WCRFPUS2

refinery utilization with stocks-2nd-diff fallback drives a novel refiner-

equity passthrough on VLO/MPC/PSX/DK/HFC/INT/PBF/PARR.

CME futures + USDA grain basis (#239-#241)

Three commodity-curve families. #239 cme_basis_curve_steepening

extends Erb-Harvey 2006 *FAJ* "The Strategic and Tactical Value of

Commodity Futures" and Gorton-Rouwenhorst 2006 *FAJ* by using

log(CL=F[t]) − log(CL=F[t-63d]) as a backwardation proxy, going long the

cyclical basket on a steepening signal. **#240 cme_silver_gold_ratio_

regime** extends Baur-Lucey 2010 *FR* "Is Gold a Hedge or a Safe Haven?"

— the SI/GC 90-day rolling z bifurcates into an "industrial regime"

(rising ratio → industrial miners) and a "monetary regime" (falling ratio

→ monetary gold ETFs). #241 usda_basis_inversion_ag_long extends

Working 1949 *American Economic Review* "Theory of Price of Storage" and

McNew-Fackler 1996 *AJAE* by tracking mean basis inversion across corn,

soybeans, and wheat; an inverted basis is a stress signal that triggers

LONG ag-equipment (DE, AGCO) plus fertilizer (MOS, CF, NTR).

Consumer gaming — Steam (#242-#244)

Three families on Steam concurrent-user (CCU) telemetry, scraped hourly

from SteamSpy. #242 steam_release_window_drift extends Hennig-Thurau-

Houston-Sridhar 2006 *J. Marketing* on new-product launches in established

categories with a novel CCU-event-day application. **#243 steam_genre_

rotation** extends Wu-Liu-Yang 2019 *J. Business Research* on genre

dynamics in PC gaming with a novel Steam-top-100 share-rotation

specification. #244 steam_player_decay_30d is novel research — we

found no academic precedent for using a retention curve as a short signal

in equity research. The thesis: a publisher whose flagship title decays

faster than peer launches is signaling a content-pipeline weakness that

will hit revenue before sell-side picks it up.

Box office (#245-#246)

Two families on theatrical-release grosses. **#245 box_office_long_tail_

drift** keys off De Vany-Walls 1999 *Journal of Cultural Economics* on

opening-weekend uncertainty and Einav 2007 *RAND J. Econ.* on seasonality.

#246 box_office_holiday_window_alpha extends Krider-Weinberg 1998 *JMR*

"Competitive Dynamics and the Introduction of New Products" with a novel

holiday-window-to-equity passthrough application. Both target distributor

equities (DIS, CMCSA, AMC, IMAX, CNK).

NFT (#247)

#247 nft_volume_crypto_proxy_lead extends Dowling 2022 *Finance

Research Letters* "Is Non-Fungible Token Pricing Driven by

Cryptocurrencies?" with a novel application: NFT marketplace volume as a

leading indicator for crypto-adjacent equities (COIN, MSTR, MARA, RIOT).

The framing is that NFT volume often leads BTC by a few days, and BTC

leads the equity proxies by a few more.

Weather + storm track — NOAA (#248-#249)

Two families on NOAA HURDAT2 Atlantic/EPAC storm tracks. **#248 noaa_

landfall_insurance_short** extends Born-Viscusi 1994 *Journal of Risk and

Insurance* on insurance-market responses to catastrophe and Klein 1998 *J.

Insurance Regulation* with a novel track-cone-trigger application: when

the 72-hour NHC cone narrows over a population center, short ALL/PGR/

TRV. #249 noaa_storm_refiner_disruption uses Considine 2002 *EE* on

inventory and market power in crude, plus Cohen-Garcia-Khan-Pinchuk 2024

on hurricanes and equity returns, to route Gulf-of-Mexico landfall risk

to refiner equities on the supply-disruption side.

Wildfire — NIFC (#250)

#250 nifc_wildfire_utility_short is novel research. The Camp Fire

(2018, PCG) and SDG&E (2007-08, SRE) inverse-condemnation precedents

motivate a per-utility wildfire-perimeter overlap cohort. When a NIFC

active-fire polygon overlaps a utility's service territory and the fire

is human-caused-near-power-infrastructure, the family shorts the

utility. The thesis: California IOUs are uniquely exposed to inverse-

condemnation liability that the market reprices slowly.

Port throughput (#251)

#251 port_inbound_retail_inventory_build extends Wagner-Tay-Gilliam

2024 *International Journal of Production Economics* "Port-Throughput as

a Leading Indicator of Retail Inventory Cycles" and Hummels-Klenow 2005

*AER*, with a novel per-retailer passthrough specification. Inbound TEU

counts at LA/LB/NY/SAV/SEA, derived from MarineCadastre AIS, lead

retailer same-quarter inventory builds — long inventory builds for

retailers running tight, short them for retailers already on glut.

Trade flows — Census FT900 (#252)

#252 census_ft900_tariff_passthrough sits on the monthly Census FT900

trade balance per HS commodity. Extends Amiti-Redding-Weinstein 2019 *JEP*

on the 2018 trade-war price-and-welfare impact and Fajgelbaum-Goldberg-

Kennedy-Khandelwal 2020 *QJE* "The Return to Protectionism." The signal

is a passthrough z-score from tariff-impacted HS lines to the US-listed

firms with disclosed exposure to those product categories.

Terror events (#253-#254)

Two families on GTD (Global Terrorism Database). **#253 terror_defense_

premium** uses Karolyi-Martell 2010 *International Review of Applied

Financial Issues and Economics* on terrorism and the stock market, plus

Eldor-Melnick 2004 *Journal of Banking & Finance*. Major-attack events

trigger LONG defense names (LMT, RTX, NOC, GD, BA-defense). **#254

terror_consumer_discretionary_short** uses Drakos-Kutan 2003 *Journal of

Conflict Resolution* on tourism-region effects and Becker-Rubinstein

2011 *Economic Journal* on the consumer response. Same trigger, different

basket — SHORT DRI, MCD, CCL, RCL on the consumer-discretionary fade.

Corporate jet (#255-#256)

Two families on OpenSky / ADS-B tail-number tracks, after a v2 migration

off the deprecated `corporate_jet_flights` feed (frozen 2022-12-30).

#255 corp_jet_cluster_ma_leak uses Akey-Heimer 2020 *JFE* "Why Do

Executives Have So Many Flight Benefits?" and Jiang-Habib-Hou-Liu 2020

*RFS* on executive private jets and corporate fraud. When ≥ 2 executive

jets land at the same airport within 48 hours, the family goes LONG the

suspected target's equity on a 10-20d hold. **#256 corp_jet_dc_lobby_

intensity** extends Akey-Lewellen 2017 *JF* on policy uncertainty and

political capital, plus Hill-Kelly-Lockhart 2014 *JFE* on lobbying

determinants and effects, with a novel ADS-B-derived flight-frequency

proxy. DCA/IAD/DCA-area airport visits ≥ 1.5σ vs trailing 2y trigger

LONG the regulated firm on a 20-60d hold.

Patent (#257-#258)

#257 patent_npe_attack_short uses Bessen-Meurer 2008 *JEP* "Of

Patents and Property" plus Cohen-Gurun-Kominers 2019 *RFS* "Patent Trolls"

and Tucker 2014 *Management Science*. Non-practicing-entity suit-density

above a threshold, identified via a Courtlistener-docket NPE classifier,

triggers SHORT the tech defendant 20-40d. **#258 patent_continuation_

burst_long** extends Hall-Jaffe-Trajtenberg 2005 *RAND J. Econ.* "Market

Value and Patent Citations" and Lerner 1994 with a novel continuation-

burst event specification: ≥ 5 USPTO continuation filings on a single

parent in 90 days signals strategic prosecution depth on a high-value

invention, triggering LONG the assignee 60-90d.

Drug safety — FAERS (#259)

#259 faers_drug_launch_safety_curve extends Olfson-Marcus 2009 on

national patterns in antidepressant treatment with a novel biotech-equity

short specification. The trigger: a newly-launched drug's first-90d

FAERS severity score is ≥ 2× the peer-baseline for the same indication.

Short the sponsor 30-60d on the expected labeling-update risk.

Cross-source composites (#260-#263)

The last four wave-6 families don't extend a single paper — they compose

signals from three or four upstream feeds into a single decision.

#260 geopolitical_supply_chain_risk composes GDELT country-tone with

ACLED event density, port-throughput AIS, and Baltic Dry into a single

"is global supply chain stressed?" score. When the composite z ≥ +1.5,

short cyclical industrials 10-20d. Extends Caldara-Iacoviello 2022 *AER*

and Carvalho-Nirei-Saito-Tahbaz-Salehi 2021 *QJE* on supply-chain

disruptions. #261 macro_event_sentiment_composite composes GDELT

volatility with the FRED yield curve and polymarket macro questions to

build a gate that suppresses long-trend signals during high-stress

macro regimes. Extends Manela-Moreira 2017 *JFE* "News Implied Volatility

and Disaster Concerns." #262 corporate_lobbying_x_polymarket is novel

research, inheriting both Akey-Lewellen 2017 *JF* and Wolfers-Zitzewitz

2004 *JEP*. When a firm is in the top quartile of lobbying spend *and*

the polymarket probability of its favored regulatory outcome is ≥ 60%,

the family goes long 60-90d. #263 weather_x_eia_x_utilities is a

pair-trade composite: cold-anomaly HDD plus tight EIA natgas storage

goes LONG natgas E&P and SHORT regulated electric utility on the same

day, on the theory that utilities eat the demand shock while E&P

captures the price spike. Extends Linn-Muehlenbachs 2018 *JAERE* and Mu

2007 *EE*.

What happens next

Each of these 39 families is now in the discovery harness's rotation. The

discovery worker will fire them against the production universe ticker-

by-ticker over the next two to three weeks. Most candidates will fail

the DSR / PBO / CPCV / Monte Carlo gates; some will pass. Those that

pass get promoted to `GENERATOR_REGISTRY`-eligible status and become

available to the picker as production-champion candidates. Each ticker

still has exactly one champion at a time, and the picker re-ranks weekly.

The honest framing: the academic-canon families (the first 218) are the

front-page deck — institutional LPs, pension consultants, fund-of-funds

expect to see them. The Moonshot A/B combo and wave-6 alt-data families

are exploration capital. Some will land. Some won't. The validation

harness is what separates the two. We will publish a follow-up in 4-6

weeks with the pass-rates per wave-6 family, the candidates that survived

the full Deflated Sharpe Ratio plus Combinatorial Purged CV plus walk-

forward gauntlet, and the data-table row counts as the alt-data feeds

backfill.

If you want to inspect the families today, [the full menu lives in the

alpha-families catalog](https://alphactor.ai/alpha-families) — wave-6

families carry a `recently_added` tag and an "extends" paper citation.

The methodology page explains the

nine-layer validation harness each family runs through before it can

become a champion.

FAQ

Are all 39 wave-6 families live trading strategies?

No. They are registered alpha families in the discovery harness. A family only becomes production-relevant after candidate rows pass the validation gates and the picker ranks it against existing champion strategies.

Why add alternative data instead of more academic factors?

The academic canon is still the foundation, but many published mechanisms were measured with coarse or delayed data. Wave-6 tests whether faster feeds like GDELT, ACLED, EIA, Polymarket, and ADS-B can make those mechanisms more tradable.

What happens when a candidate fails the harness?

It stays in the audit trail with the failed gate and diagnostic context. Failed candidates are useful because they keep the registry honest and prevent a visually appealing backtest from being promoted without evidence.

When will the next results be published?

The plan is to publish follow-up pass rates after the discovery worker has run the wave-6 families across the production universe for several weeks.

Related posts

See it in the app

Live dashboard views that match this post. Each tile deep-links to the exact card.

Related reading

Ready to try alphactor.ai?

Validate your trading strategies with statistical credibility testing. Start free.

Get Started Free
For informational and educational purposes only. Not financial advice. Learn more