Skip to content

Commit 2d98162

Browse files
feat: add optional Rust acceleration, benchmarks, and CI integration
Introduce acceleration layer (gsppy/accelerate.py) that uses a PyO3 Rust extension for support counting and gracefully falls back to pure Python; runtime backend selection via GSPPY_BACKEND (rust/python/auto). Add bench_support.py (Click CLI) to compare Python vs Rust backends with options like --max_k and --warmup for reproducible micro-benchmarks. Extend Makefile with Rust helpers: rust-setup, rust-build (idempotent skip when up-to-date by checking the installed .so vs Rust sources), bench-small, and bench-big; update help output with a Rust acceleration section. Update README.md and CONTRIBUTING.md with instructions to build the Rust extension, choose a backend at runtime, and run benchmarks of various sizes. CI: update codecov workflow to optionally install Rust and build the extension in tests, and add a dedicated test-rust job that builds the extension, runs tests with GSPPY_BACKEND=rust, and uploads coverage/test results flagged as "rust". Motivation: improve performance of hot support-counting loops while preserving a zero-dependency Python fallback; provide tooling and CI coverage to ensure stability and reproducibility.
1 parent f9939e6 commit 2d98162

File tree

11 files changed

+766
-17
lines changed

11 files changed

+766
-17
lines changed

.github/workflows/codecov.yml

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,24 @@ jobs:
3232
uv sync --frozen --extra dev
3333
uv pip install -e .
3434
35-
- name: Run tests with coverage
35+
- name: Install Rust toolchain (optional)
36+
continue-on-error: true
37+
run: |
38+
curl -Ls https://sh.rustup.rs | bash -s -- -y || true
39+
echo "$HOME/.cargo/bin" >> $GITHUB_PATH || true
40+
source $HOME/.cargo/env || true
41+
rustc --version || true
42+
43+
- name: Build Rust extension (optional)
44+
continue-on-error: true
45+
run: |
46+
source $HOME/.cargo/env || true
47+
uv run --python .venv/bin/python --no-project pip install maturin==1.6.0 || true
48+
cd rust && uv run --python .venv/bin/python --no-project python -m maturin develop --release || true
49+
50+
- name: Run tests with coverage (Rust backend)
51+
env:
52+
GSPPY_BACKEND: rust
3653
run: |
3754
uv run pytest --cov --cov-branch --junitxml=junit.xml -o junit_family=legacy
3855
@@ -49,3 +66,55 @@ jobs:
4966
uses: codecov/test-results-action@v1
5067
with:
5168
token: ${{ secrets.CODECOV_TOKEN }}
69+
70+
test-rust:
71+
name: Run tests with Rust backend (Python ${{ matrix.python-version }})
72+
runs-on: ubuntu-latest
73+
strategy:
74+
fail-fast: false
75+
matrix:
76+
python-version: ["3.13"]
77+
steps:
78+
- name: Checkout
79+
uses: actions/checkout@v5
80+
with:
81+
fetch-depth: 0
82+
83+
- name: Install uv
84+
run: |
85+
curl -Ls https://astral.sh/uv/install.sh | bash
86+
echo "${HOME}/.local/bin" >> $GITHUB_PATH
87+
88+
- name: Create venv (Python ${{ matrix.python-version }})
89+
run: |
90+
uv python install ${{ matrix.python-version }}
91+
uv venv .venv --python ${{ matrix.python-version }}
92+
93+
- name: Install dependencies
94+
run: |
95+
uv sync --frozen --extra dev
96+
uv pip install -e .
97+
98+
- name: Build Rust extension
99+
run: |
100+
make rust-build
101+
102+
- name: Run tests with coverage (Rust backend)
103+
env:
104+
GSPPY_BACKEND: rust
105+
run: |
106+
uv run pytest --cov=gsppy --cov-branch --cov-report=term-missing:skip-covered --cov-report=xml --junitxml=junit-rust.xml -o junit_family=legacy
107+
108+
- name: Upload coverage to Codecov (rust)
109+
uses: codecov/codecov-action@v5
110+
with:
111+
files: coverage.xml
112+
flags: rust
113+
name: codecov-coverage-report-rust
114+
115+
- name: Upload test results to Codecov (rust)
116+
if: ${{ !cancelled() }}
117+
uses: codecov/test-results-action@v1
118+
with:
119+
flags: rust
120+
files: junit-rust.xml

CONTRIBUTING.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,33 @@ To get familiar with the existing code, follow these steps:
128128
129129
Note: This project integrates the "tox-uv" plugin. When running `tox` locally (or `make tox`), missing Python interpreters can be provisioned automatically via uv, so you don't need to have all versions installed ahead of time.
130130

131+
4. **Optional: Rust Acceleration**
132+
133+
Some hot loops can be accelerated with Rust via PyO3. This is entirely optional: the library will fall back to pure Python if the extension is not present.
134+
135+
- Install Rust and build the extension:
136+
```bash
137+
make rust-build
138+
```
139+
140+
- Choose backend at runtime (defaults to auto):
141+
```bash
142+
export GSPPY_BACKEND=rust # or python, or unset for auto
143+
```
144+
145+
- Run a benchmark (small):
146+
```bash
147+
make bench-small
148+
```
149+
150+
- Run a larger benchmark (adjust to your machine):
151+
```bash
152+
make bench-big
153+
# or customize:
154+
GSPPY_BACKEND=auto uv run --python .venv/bin/python --no-project \
155+
python benchmarks/bench_support.py --n_tx 1000000 --tx_len 8 --vocab 50000 --min_support 0.2 --warmup
156+
```
157+
131158
3. **Explore the Code**:
132159
The main entry point for the GSP algorithm is in the `gsppy` module. The libraries for support counting, candidate generation, and additional utility functions are also within this module.
133160

Makefile

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ help:
3030
@echo " check - lint + typecheck + test"
3131
@echo "\nEnvironment:"
3232
@echo " UV_LINK_MODE=$(UV_LINK_MODE) (default: copy)"
33+
@echo "\nRust acceleration:"
34+
@echo " rust-setup - Install Rust toolchain (rustup)"
35+
@echo " rust-build - Build and develop-install Rust extension via maturin (skips if up-to-date)"
36+
@echo " bench-small - Run small benchmark (default sizes)"
37+
@echo " bench-big - Run large benchmark (e.g., 1M tx; beware memory/CPU)"
3338

3439
# Ensure uv is installed; install if missing
3540
ensure-uv:
@@ -132,3 +137,57 @@ check: lint typecheck test
132137
check-precommit: ensure-uv install
133138
$(MAKE) test
134139
$(MAKE) typecheck
140+
141+
# --- Rust acceleration helpers ---
142+
.PHONY: rust-setup rust-build bench-small bench-big
143+
144+
rust-setup:
145+
@if ! command -v rustc >/dev/null 2>&1; then \
146+
echo "Installing Rust toolchain..."; \
147+
curl -Ls https://sh.rustup.rs | bash -s -- -y; \
148+
source $$HOME/.cargo/env; \
149+
rustc --version; \
150+
else \
151+
rustc --version; \
152+
fi
153+
154+
rust-build: ensure-uv rust-setup
155+
@# Optionally force rebuild: make rust-build FORCE_RUST_BUILD=1
156+
@force_build="$(FORCE_RUST_BUILD)"; \
157+
if [ "$$force_build" = "1" ]; then \
158+
echo "FORCE_RUST_BUILD=1 set; rebuilding Rust extension"; \
159+
source $$HOME/.cargo/env; \
160+
$(UV) pip install --python $(PYTHON) --upgrade pip setuptools wheel maturin==1.6.0 >/dev/null; \
161+
$(UV) run --python $(PYTHON) --no-project python -m maturin develop --release -m rust/Cargo.toml; \
162+
exit $$?; \
163+
fi; \
164+
# Determine if extension is already installed and up-to-date (resolve the .so path) \
165+
so_path="$$( \
166+
$(UV) run --python $(PYTHON) --no-project python -c 'import importlib.util as u; s=u.find_spec("_gsppy_rust._gsppy_rust"); print(s.origin if s and s.origin else(""))' \
167+
)"; \
168+
if [ -n "$$so_path" ] && [ -f "$$so_path" ]; then \
169+
up_to_date=1; \
170+
for src in rust/Cargo.toml $$(find rust/src -type f -name '*.rs'); do \
171+
if [ "$$src" -nt "$$so_path" ]; then up_to_date=0; break; fi; \
172+
done; \
173+
if [ "$$up_to_date" -eq 1 ]; then \
174+
echo "Rust extension is up-to-date at $$so_path; skipping build"; \
175+
exit 0; \
176+
else \
177+
echo "Rust sources changed; rebuilding"; \
178+
fi; \
179+
else \
180+
echo "Rust extension missing; building"; \
181+
fi; \
182+
source $$HOME/.cargo/env; \
183+
$(UV) pip install --python $(PYTHON) --upgrade pip setuptools wheel maturin==1.6.0 >/dev/null; \
184+
$(UV) run --python $(PYTHON) --no-project python -m maturin develop --release -m rust/Cargo.toml
185+
186+
bench-small: rust-build
187+
@$(UV) run --python $(PYTHON) --no-project pip install -e . >/dev/null; \
188+
GSPPY_BACKEND=auto $(UV) run --python $(PYTHON) --no-project python benchmarks/bench_support.py --n_tx 100000 --tx_len 8 --vocab 10 --min_support 0.2 --warmup
189+
190+
bench-big: rust-build
191+
@echo "WARNING: This may take significant memory/CPU. Adjust sizes to your machine."; \
192+
$(UV) run --python $(PYTHON) --no-project pip install -e . >/dev/null; \
193+
GSPPY_BACKEND=auto $(UV) run --python $(PYTHON) --no-project python benchmarks/bench_support.py --n_tx 1000000 --tx_len 8 --vocab 50000 --min_support 0.2

README.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,30 @@ uv sync --frozen --extra dev # uses uv.lock
115115
uv pip install -e .
116116
```
117117

118-
#### 3. Common development tasks
118+
#### 3. Optional: Enable Rust acceleration
119+
120+
Rust acceleration is optional and provides faster support counting using a PyO3 extension. Python fallback remains available.
121+
122+
Build the extension locally:
123+
```bash
124+
make rust-build
125+
```
126+
127+
Select backend at runtime (auto tries Rust, then falls back to Python):
128+
```bash
129+
export GSPPY_BACKEND=rust # or python, or unset for auto
130+
```
131+
132+
Run benchmarks (adjust to your machine):
133+
```bash
134+
make bench-small
135+
make bench-big # may use significant memory/CPU
136+
# or customize:
137+
GSPPY_BACKEND=auto uv run --python .venv/bin/python --no-project \
138+
python benchmarks/bench_support.py --n_tx 1000000 --tx_len 8 --vocab 50000 --min_support 0.2 --warmup
139+
```
140+
141+
#### 4. Common development tasks
119142
After the environment is ready, activate it and run tasks with standard tools:
120143

121144
```bash
@@ -133,7 +156,7 @@ uv run ruff check .
133156
uv run pyright
134157
```
135158

136-
#### 4. Makefile (shortcuts)
159+
#### 5. Makefile (shortcuts)
137160
You can use the Makefile to automate common tasks:
138161

139162
```bash
@@ -145,6 +168,12 @@ make format # ruff --fix
145168
make typecheck # pyright (and mypy if configured)
146169
make pre-commit-install # install the pre-commit hook
147170
make pre-commit-run # run pre-commit on all files
171+
172+
# Rust-specific shortcuts
173+
make rust-setup # install rustup toolchain
174+
make rust-build # build PyO3 extension with maturin
175+
make bench-small # run small benchmark
176+
make bench-big # run large benchmark
148177
```
149178

150179
> [!NOTE]

benchmarks/bench_support.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import os
2+
import time
3+
import random
4+
from typing import List, Optional
5+
6+
import click
7+
8+
from gsppy.gsp import GSP
9+
10+
random.seed(0)
11+
12+
13+
def gen_data(n_tx: int, tx_len: int, vocab_size: int) -> List[List[str]]:
14+
vocab = [f"I{i}" for i in range(vocab_size)]
15+
# Using comprehension keeps it simple; for very large n, consider streaming
16+
return [random.sample(vocab, tx_len) for _ in range(n_tx)]
17+
18+
19+
def run_once(backend: str, transactions: List[List[str]], min_support: float, max_k: Optional[int]) -> float:
20+
os.environ["GSPPY_BACKEND"] = backend
21+
t0 = time.perf_counter()
22+
GSP(transactions).search(min_support=min_support, max_k=max_k)
23+
return time.perf_counter() - t0
24+
25+
26+
@click.command()
27+
@click.option("--n_tx", default=10_000, show_default=True, type=int, help="Number of transactions (e.g., 1_000_000)")
28+
@click.option("--tx_len", default=8, show_default=True, type=int, help="Items per transaction")
29+
@click.option("--vocab", default=10_000, show_default=True, type=int, help="Vocabulary size")
30+
@click.option("--min_support", default=0.2, show_default=True, type=float, help="Minimum fractional support (0,1]")
31+
@click.option(
32+
"--max_k",
33+
default=1,
34+
show_default=True,
35+
type=int,
36+
help="Limit maximum sequence length (1 focuses on singleton support)",
37+
)
38+
@click.option("--warmup", is_flag=True, help="Do a Python warmup run before timing")
39+
def main(n_tx: int, tx_len: int, vocab: int, min_support: float, max_k: int, warmup: bool) -> None:
40+
click.echo(f"Generating data: n_tx={n_tx:,}, tx_len={tx_len}, vocab={vocab:,}")
41+
transactions = gen_data(n_tx=n_tx, tx_len=tx_len, vocab_size=vocab)
42+
43+
if warmup:
44+
try:
45+
run_once("python", transactions, min_support, max_k)
46+
except Exception:
47+
pass
48+
49+
click.echo("Running Python backend...")
50+
t_py = run_once("python", transactions, min_support, max_k)
51+
52+
click.echo("Running Rust backend...")
53+
try:
54+
t_rs = run_once("rust", transactions, min_support, max_k)
55+
speedup = t_py / t_rs if t_rs > 0 else float("inf")
56+
improvement = (t_py - t_rs) / t_py * 100.0
57+
click.echo(f"Python: {t_py:.3f}s\nRust: {t_rs:.3f}s\nSpeedup: {speedup:.2f}x (+{improvement:.1f}%)")
58+
except Exception as e:
59+
click.echo(f"Rust backend not available or failed: {e}\nPython time: {t_py:.3f}s")
60+
61+
62+
if __name__ == "__main__":
63+
main()

0 commit comments

Comments
 (0)