Tone-Dispersion Factor Research

Within-call sentiment variance as a market-neutral signal

Project Layout

earnings_call_tone_research/
│
├─ run_backtest.py            # one-shot: build → neutralise → weights → PnL
├─ src/                       # pure-Python research pipeline
│   ├─ load.py                # reads the three parquet inputs
│   ├─ factor_build.py        # z-scores tone dispersion by trade-date
│   ├─ neutralise.py          # FF5+UMD regression (daily)
│   ├─ portfolio.py           # weight construction & PnL
│   └─ report.py              # quick tear-sheet
│
├─ data/                      # **exactly three parquet files**
│   ├─ tone_dispersion.parquet   # call-level dispersion
│   ├─ stock_prices.parquet      # wide adj-close matrix, index=date
│   └─ ff5_daily.parquet         # daily MKT, SMB, HML, RMW, CMA, UMD, RF
│
└─ tests/                     # pytest suite (unit + integration)

Required Inputs

file	expected schema
tone_dispersion.parquet	Columns: `symbol`, `date`, `tone_dispersion`
stock_prices.parquet	Wide table; `index=date`, `columns=tickers`, `values=adjClose`
ff5_daily.parquet	`index=date`; columns `mktrf smb hml rmw cma umd rf`

No other data sources or credentials are required.

Transcript Data for Sentiment Analysis

The sentiment analysis component of this research relies on earnings call transcripts. The specific dataset used for building or applying the sentiment models can be found at:

Kurry's S&P 500 Earnings Transcripts on Hugging Face

Tone-Dispersion Calculation 🛈

Granularity: Each speaker-turn (a continuous block of speech by one speaker) inside the earnings-call transcript is scored.
Sentiment score: Each turn receives a scalar in [-1, 1] from a local language-model sentiment head (or any deterministic sentiment function):
```
sentiment(turn_i) → s_i
```
Dispersion metric: The population variance of all sentiment scores within the call (including both executive and analyst turns):

$$ \text{tone_dispersion} = \frac{1}{N} \sum_{i=1}^{N} (s_i - \bar{s})^2, \qquad \bar{s} = \frac{1}{N} \sum_{i=1}^{N} s_i $$

Requires at least two non-empty turns; otherwise, the value is NaN.

Note: This formula requires a Markdown viewer that supports LaTeX math rendering. For reference, in plain code:
```
mean = sum(scores) / len(scores)
variance = sum((s - mean) ** 2 for s in scores) / len(scores)
```

The resulting table written to data/tone_dispersion.parquet therefore holds one row per call:

symbol	date (call timestamp)	tone_dispersion
AAPL	2024-10-24 16:30:00	0.0134
MSFT	2024-10-26 17:00:00	0.0027
...	...	...

During factor construction the script:

Shifts date to the next NYSE trading day (trade_date).
Z-scores tone_dispersion cross-sectionally each trade_date to remove level shifts.
Feeds the standardized series to the FF5+UMD neutralisation step.

Quick-Start

# 1. create environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 2. drop the three parquet files into data/

# 3. run the pipeline
python run_backtest.py          # writes outputs/weights.parquet & tear-sheet

# 4. run tests
pytest -q                       # sanity-checks factor, alignment and PnL

Modifying the Research

Task	Where
Change holding horizon	`portfolio.pnl(horizon=N)`
Different risk model	`src/neutralise.py`
Custom weighting scheme	`src/portfolio.py`
Add transaction costs	modify logic inside `portfolio.pnl`

GitHub Pages

You can host the backtest results and tear sheet using GitHub Pages:

Run python run_backtest.py to generate the outputs/tearsheet.html file.
Copy the generated tear sheet to the docs/ directory or reference it from there.
Commit and push the docs/ directory to your repository.
In your GitHub repository settings, enable Pages with:
- Source: main branch
- Folder: /docs
Navigate to https://kurry.github.io/earnings_call_tone_research/ to view the documentation site.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.claude		.claude
.github/workflows		.github/workflows
.vscode		.vscode
data		data
docs		docs
outputs		outputs
src		src
tests		tests
.cursorignore		.cursorignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
generate_docs_assets.py		generate_docs_assets.py
generate_recent_performance.py		generate_recent_performance.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_backtest.py		run_backtest.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tone-Dispersion Factor Research

Project Layout

Required Inputs

Transcript Data for Sentiment Analysis

Tone-Dispersion Calculation 🛈

Quick-Start

Modifying the Research

GitHub Pages

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Kurry/earnings_call_tone_research

Folders and files

Latest commit

History

Repository files navigation

Tone-Dispersion Factor Research

Project Layout

Required Inputs

Transcript Data for Sentiment Analysis

Tone-Dispersion Calculation 🛈

Quick-Start

Modifying the Research

GitHub Pages

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages