One‑line pitch: Drop a meeting video in your terminal and get back a clean, speaker‑labelled transcript plus Markdown notes — all with free, open‑source models you can run locally in under an hour.
Feature | Status | Notes |
---|---|---|
High‑quality multilingual transcription (OpenAI Whisper) | ✅ | Runs on CPU or GPU |
Automatic speaker diarization (pyannote.audio) | ✅ | Distinguishes who spoke |
Markdown export with timestamps | ✅ | Saves to results/transcript.md |
One‑command CLI (python main.py <video> ) |
✅ | Creates a results/ folder |
Key‑frame extraction for slide changes (LMSKE) | ⏳ | Planned v0.2 |
Topic summarisation & action‑items (LLM) | ⏳ | Planned v0.3 |
Prereqs: Python 3.10+,
ffmpeg
(≥ 4.2).
# 1. Clone & enter
git clone https://github.com/JuanLara18/Meeting-Scribe.git
cd meetingscribe
# 2. Create env & install deps
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r config/requirements.txt
# 3. Run on a sample video
python main.py path/to/meeting.mp4
Output:
results/
├── transcript.md # speaker‑segmented Markdown
└── transcript.json # raw structured data
usage: python main.py [-h] [--lang en] [--model base] VIDEO_PATH
VIDEO_PATH
– any local.mp4 /.mkv /.mov
file.--lang
– ISO‑639‑1 code for forced language (default: auto‑detect).--model
– whisper size (tiny
base
small
medium
large-v3
).
Example with Spanish audio, medium model:
python main.py reunión.mp4 --lang es --model medium
meetingscribe
├── main.py # orchestrates the end‑to‑end pipeline
├── processing/
│ ├── audio.py # audio extraction (ffmpeg)
│ ├── transcribe.py # whisper wrapper
│ ├── diarize.py # pyannote wrapper
│ └── merge.py # align speakers + text
├── utils/
│ └── markdown.py # export helpers
├── requirements.txt
├── README.md
└── results/ # auto‑created
Minimal today — but every component lives in its own module so we can swap models or add GPU acceleration without touching main.py
.
Layer | Tool | Why |
---|---|---|
ASR | OpenAI Whisper | State‑of‑the‑art, MIT license, offline |
Diarization | pyannote.audio | SOTA pretrained pipelines |
Media | ffmpeg |
battle‑tested extraction |
Future vision | LMSKE (key‑frames), Llama‑3 local LLMs | keep everything free & private |
video.mp4
│
┌─────────▼─────────┐
│ 1. ffmpeg extract │──► audio.wav
└─────────┬─────────┘
│
┌─────────────▼─────────────┐
│ 2. Whisper ASR → segments │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 3. pyannote diarize audio │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 4. Merge text+speakers │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 5. Export Markdown & JSON │
└───────────────────────────┘
- v0.2 – Key‑frame extraction ➜ embed screenshots in Markdown.
- v0.3 – Local LLM summariser ➜ bullet goals, action items.
- v1.0 – Real‑time streaming mode & simple web UI (FastAPI + React).
MIT — free for personal & commercial projects. Attribution welcome but not required.