Within-call sentiment variance as a market-neutral signal
earnings_call_tone_research/
│
├─ run_backtest.py # one-shot: build → neutralise → weights → PnL
├─ src/ # pure-Python research pipeline
│ ├─ load.py # reads the three parquet inputs
│ ├─ factor_build.py # z-scores tone dispersion by trade-date
│ ├─ neutralise.py # FF5+UMD regression (daily)
│ ├─ portfolio.py # weight construction & PnL
│ └─ report.py # quick tear-sheet
│
├─ data/ # **exactly three parquet files**
│ ├─ tone_dispersion.parquet # call-level dispersion
│ ├─ stock_prices.parquet # wide adj-close matrix, index=date
│ └─ ff5_daily.parquet # daily MKT, SMB, HML, RMW, CMA, UMD, RF
│
└─ tests/ # pytest suite (unit + integration)
file | expected schema |
---|---|
tone_dispersion.parquet | Columns: symbol , date , tone_dispersion |
stock_prices.parquet | Wide table; index=date , columns=tickers , values=adjClose |
ff5_daily.parquet | index=date ; columns mktrf smb hml rmw cma umd rf |
No other data sources or credentials are required.
The sentiment analysis component of this research relies on earnings call transcripts. The specific dataset used for building or applying the sentiment models can be found at:
-
Granularity: Each speaker-turn (a continuous block of speech by one speaker) inside the earnings-call transcript is scored.
-
Sentiment score: Each turn receives a scalar in [-1, 1] from a local language-model sentiment head (or any deterministic sentiment function):
sentiment(turn_i) → s_i
-
Dispersion metric: The population variance of all sentiment scores within the call (including both executive and analyst turns):
$$ \text{tone_dispersion} = \frac{1}{N} \sum_{i=1}^{N} (s_i - \bar{s})^2, \qquad \bar{s} = \frac{1}{N} \sum_{i=1}^{N} s_i $$ Requires at least two non-empty turns; otherwise, the value is NaN.
Note: This formula requires a Markdown viewer that supports LaTeX math rendering. For reference, in plain code:
mean = sum(scores) / len(scores) variance = sum((s - mean) ** 2 for s in scores) / len(scores)
The resulting table written to data/tone_dispersion.parquet
therefore holds one row per call:
symbol | date (call timestamp) | tone_dispersion |
---|---|---|
AAPL | 2024-10-24 16:30:00 | 0.0134 |
MSFT | 2024-10-26 17:00:00 | 0.0027 |
... | ... | ... |
During factor construction the script:
- Shifts
date
to the next NYSE trading day (trade_date
). - Z-scores
tone_dispersion
cross-sectionally eachtrade_date
to remove level shifts. - Feeds the standardized series to the FF5+UMD neutralisation step.
# 1. create environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 2. drop the three parquet files into data/
# 3. run the pipeline
python run_backtest.py # writes outputs/weights.parquet & tear-sheet
# 4. run tests
pytest -q # sanity-checks factor, alignment and PnL
Task | Where |
---|---|
Change holding horizon | portfolio.pnl(horizon=N) |
Different risk model | src/neutralise.py |
Custom weighting scheme | src/portfolio.py |
Add transaction costs | modify logic inside portfolio.pnl |
You can host the backtest results and tear sheet using GitHub Pages:
- Run
python run_backtest.py
to generate theoutputs/tearsheet.html
file. - Copy the generated tear sheet to the
docs/
directory or reference it from there. - Commit and push the
docs/
directory to your repository. - In your GitHub repository settings, enable Pages with:
- Source: main branch
- Folder: /docs
- Navigate to
https://kurry.github.io/earnings_call_tone_research/
to view the documentation site.
This project is licensed under the MIT License. See the LICENSE file for details.