call transcript jaccard similarity
What it checks
When management uses the same words on the earnings call as last quarter, that's a good sign. Big language shifts predict trouble.
Mechanism
Jaccard similarity of earnings-call transcript tokens vs prior call. High similarity (lazy/stable management) → long. Low similarity (jolt — strategy shift, guidance reversal hidden in language change) → short.
Signal rule
long when jaccard >= 0.70-0.85; short when <= 0.20-0.40; 63d hold (one earnings cycle)
Data dependencies
earnings_call_transcriptsFull earnings-call transcripts (prepared + Q&A), tokenised.
daily_pricesAdjusted-close OHLCV for every US-listed ticker; primary price feed.
Expected edge
- Paper alpha
- ~3% per quarter top-decile
- Paper Sharpe
- ~0.3
- Paper window
- Q/Q
Brochet et al 2015: ~3% per-quarter abnormal return top-vs-bottom decile; replicates at ~0.3 Sharpe 2018-2024.
Example tickers where this is likely to fire
Illustrative only — the signal fires based on the live data, not a fixed list.
Related families
lazy pricesEventStocks whose 10-K (or 10-Q) text barely changes year-over-year OUTPERFORM those with big language shifts. The intuition: boring filings ≈ stable business ≈ slow-and-steady cash flow. Big text changes signal management hiding bad news with new boilerplate. Effect size: ~0.4 Sharpe alone, still replicating in 2020-2024. Long when the most recent 10-K's cosine similarity to prior year is in the top quartile (≥ 0.85); hold ~12 months until the next filing.
filing text deltaText-NLPYear-over-year change in uncertainty/risk language in 10-K Item 1 ('Business' section). Spike in 'may', 'could', 'uncertain', 'challenging', 'risk' tokens per 10K words → management is privately more cautious → forward earnings miss / underperformance. Stable or decreasing language → quietly confident outlook → outperform.
mdna tone deltaFilingsLoughran-McDonald 2011 JF demonstrate the *tone* of 10-K filings — positive vs negative finance-specific lexicon — predicts forward returns. Distinct from filing_text_delta's *uncertainty* lexicon. Effect: bottom-decile tone-delta underperforms top-decile by ~2-3% over 4 weeks, persisting ~12 weeks. Today consumes the cached Item 1 (Business) text; switches to Item 7 (MD&A) when the EDGAR pipeline emits it (tracked in docs/alpha-research/proposals/).
Explore call transcript jaccard similarity on alphactor.ai
See which tickers this family is currently firing on, with live signals and rankings.