A real-time transcript recorder for ChatGPT voice conversations with dual audio capture, whisper.cpp streaming, and intelligent LLM deduplication. Built with a fully modular frontend architecture and clean service-oriented backend design.
- Real-time transcription using whisper.cpp streaming
- Dual audio capture - microphone and system audio simultaneously
- Intelligent deduplication using LLM processing to clean overlapping transcripts
- Dual-panel interface - compare raw vs processed transcripts
- Database storage with SQLite backend for persistent transcript storage
- Session export - download transcripts in JSON, TXT, or CSV formats
- Session browser - view and manage historical recording sessions
- Dark mode interface with real-time status indicators
- Modular Frontend - 8 focused JavaScript modules with event-driven communication
- Service-Oriented Backend - Clean separation of concerns with dedicated service layer
- Event Bus System - Loose coupling between frontend modules
- State Management - Centralized state store with reactive updates
- Error Handling - Comprehensive error handling and recovery mechanisms
- Python 3.9+ with uv package manager
- macOS/Linux (Windows support via WSL)
- Git for cloning repositories
- Lambda Labs API key (for LLM processing)
- BlackHole 2ch (for system audio capture)
git clone https://github.com/jwt625/Voice_Mode_transcript.git
cd Voice_Mode_transcript
Create virtual environment:
uv venv
Install dependencies:
uv sync
Activate virtual environment (optional, uv handles this automatically):
source .venv/bin/activate
Prerequisites:
- macOS with Xcode command line tools
- Homebrew
- SDL2 (required for streaming functionality)
# Install SDL2 for streaming support
brew install sdl2 cmake
# Clone and build whisper.cpp with SDL2 support
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
# Build with SDL2 support (required for whisper-stream)
rm -rf build
mkdir build
cd build
cmake .. -DWHISPER_SDL2=ON
make -j8
# Download a model
cd ..
./models/download-ggml-model.sh base.en
# Test installation (non-streaming)
./build/bin/whisper-cli -f samples/jfk.wav -m models/ggml-base.en.bin
# Test streaming functionality (required for the app)
./build/bin/whisper-stream -m models/ggml-base.en.bin -t 6 --step 0 --length 30000 -vth 0.6
cd ..
Important: The app requires whisper-stream
binary which is only built when SDL2 support is enabled. If you see "whisper-stream not found" errors, make sure you built with -DWHISPER_SDL2=ON
.
# Install BlackHole for system audio capture
brew install blackhole-2ch
Configure Multi-Output Device:
- Open Audio MIDI Setup (Applications β Utilities)
- Click + β Create Multi-Output Device
- Check both your speakers/headphones and BlackHole 2ch
- Set this Multi-Output Device as your system audio output
- In the app, select BlackHole 2ch for system audio capture
# Create .env file
echo 'LLM_API_KEY="your_lambda_labs_api_key_here"' > .env
echo 'LLM_BASE_URL="https://api.lambda.ai/v1"' >> .env
echo 'LLM_MODEL="llama-4-maverick-17b-128e-instruct-fp8"' >> .env
echo 'WHISPER_MODEL_PATH="./whisper.cpp/models/ggml-base.en.bin"' >> .env
echo 'WHISPER_STREAM_BINARY="./whisper.cpp/build/bin/whisper-stream"' >> .env
uv run python app.py
The server will automatically find an available port starting from 5001. Check the console output for the actual URL, typically: http://localhost:5001 (or 5002, 5003, etc. if 5001 is in use)
- Open the URL shown in the console (e.g., http://localhost:5001) in your browser
- Select microphone and system audio devices
- Click "Start Recording"
- Speak into your microphone β raw transcripts appear in left panel
- Press Enter or click "Process with LLM" β cleaned transcripts appear in right panel
- Click "Stop Recording" when done
- Enter - Process accumulated transcripts with LLM
- Spacebar - Manual LLM processing trigger
- Left Panel: Real-time whisper.cpp output with overlapping segments
- Right Panel: Clean, deduplicated text from LLM processing
- Database Inspector: Browse historical sessions and export transcripts
- Export Options: Download transcripts in multiple formats (JSON, TXT, CSV)
- Click ποΈ Database Inspector button
- Browse tabs: Raw Transcripts, Processed Transcripts, Recent Sessions, Session Browser
- View database statistics and recent activity
- Go to Session Browser tab in Database Inspector
- Select a session from the list
- Click π€ Export Selected Session
- Choose export options:
- Format: JSON (structured), TXT (readable), CSV (tabular)
- Content: Raw only, Processed only, or Both
- Click β¬οΈ Download to save the file
- JSON: Complete structured data with metadata, timestamps, and all transcript details
- TXT: Human-readable format with timestamps and source information
- CSV: Spreadsheet-compatible format for data analysis
GET /api/sessions/<session_id>/export?format=<json|txt|csv>&content=<raw|processed|both>
Example:
curl "http://localhost:5001/api/sessions/session_20250704_233045/export?format=json&content=both" -o export.json
src/
βββ config/ # Configuration management
β βββ __init__.py
β βββ settings.py # AppConfig with environment variables
βββ models/ # Database models and repositories
β βββ __init__.py
β βββ database.py # Database connection and initialization
β βββ repositories.py # Repository pattern for data access
β βββ session.py # Session model
β βββ transcript.py # Transcript models
βββ services/ # Business logic layer
β βββ __init__.py
β βββ app_service.py # Main coordinator service
β βββ audio_service.py # Audio capture and monitoring
β βββ device_service.py # Audio device management
β βββ llm_service.py # LLM processing coordination
β βββ session_service.py # Session management
β βββ transcript_service.py # Transcript processing
βββ audio_capture.py # Audio capture utilities
βββ llm_processor.py # LLM processing engine
βββ sdl_device_mapper.py # SDL/PyAudio device mapping
βββ whisper_stream_processor.py # Whisper.cpp integration
static/
βββ css/
β βββ style.css # Application styles and dark mode theme
βββ js/
βββ app-modular.js # Main application orchestrator
βββ app-monolithic-backup.js # Legacy monolithic version (backup)
βββ core/ # Core framework modules
β βββ event-bus.js # Event communication system
β βββ state-store.js # Centralized state management
β βββ module-base.js # Base class for all modules
βββ config/
β βββ module-config.js # Module configuration and dependencies
βββ modules/ # Feature modules
βββ recording.js # Recording control and state management
βββ transcript.js # Transcript display and management
βββ llm.js # LLM processing coordination
βββ database.js # Database operations and session browser
βββ ui.js # User interface controls and notifications
βββ device.js # Audio device management
βββ sse.js # Server-Sent Events handling
βββ utils.js # Shared utilities and helpers
templates/
βββ index.html # Main application template
The app uses SQLite with three main tables:
sessions
- Session metadata with quality metricsraw_transcripts
- Direct whisper.cpp output with timestamps and confidence scoresprocessed_transcripts
- LLM-cleaned transcripts with original transcript references
- Raw transcripts:
id
,session_id
,text
,timestamp
,sequence_number
,confidence
,audio_source
- Processed transcripts:
id
,session_id
,processed_text
,original_transcript_ids
,llm_model
,processing_time
GET /api/sessions
- List all recording sessionsGET /api/sessions/<session_id>/export
- Export session transcriptsPOST /api/sessions/<session_id>/calculate-metrics
- Calculate session quality metrics
GET /api/raw-transcripts/<session_id>
- Get raw transcripts for session (paginated)GET /api/processed-transcripts/<session_id>
- Get processed transcripts for session (paginated)
GET /api/database/stats
- Database statistics and recent sessionsGET /api/database/raw-transcripts
- All raw transcripts (paginated)GET /api/database/processed-transcripts
- All processed transcripts (paginated)
POST /api/start
- Start recording sessionPOST /api/stop
- Stop recording sessionPOST /api/process-llm
- Trigger LLM processingGET /api/status
- Get current recording status
The application features a fully modular frontend that replaced a 3,397-line monolithic JavaScript file with 8 focused modules:
static/js/
βββ app-modular.js # Main application orchestrator
βββ core/
β βββ event-bus.js # Event communication system
β βββ state-store.js # Centralized state management
β βββ module-base.js # Base class for all modules
βββ config/
β βββ module-config.js # Module configuration and dependencies
βββ modules/
βββ recording.js # Recording control and state management
βββ transcript.js # Transcript display and management
βββ llm.js # LLM processing coordination
βββ database.js # Database operations and session browser
βββ ui.js # User interface controls and notifications
βββ device.js # Audio device management
βββ sse.js # Server-Sent Events handling
βββ utils.js # Shared utilities and helpers
- Maintainability: Each module has a single, focused responsibility
- Testability: Modules can be tested in isolation
- Scalability: Easy to add new features without affecting existing code
- Debugging: Issues can be isolated to specific modules
- Reusability: Modules can be reused in other contexts
- EventBus: Centralized event system for loose coupling between modules
- State Store: Reactive state management with subscription-based updates
- Module Lifecycle: Proper initialization, cleanup, and error handling
For detailed implementation information, see docs/014-frontend_modularization_plan.md
.
The project uses modern Python tooling for code quality:
- ruff - Fast linting and formatting
- mypy - Type checking (enabled in production)
- pre-commit - Automated quality checks
# Install development dependencies
uv sync --dev
# Install pre-commit hooks
uv run pre-commit install
# Run linting and formatting
uv run ruff check src/ app.py
uv run ruff format src/ app.py
- Configuration: Centralized in
src/config/settings.py
with environment variable support - Database: Repository pattern in
src/models/
for clean data access - Business Logic: Service layer in
src/services/
with focused responsibilities - Controllers: Flask routes (to be refactored from
app.py
) - Audio Processing: whisper.cpp integration with streaming support
- Real-time Communication: Server-Sent Events (SSE) for live updates
- Export button disabled: Make sure a session is selected in the Session Browser
- Empty export file: Check if the session has transcripts in the database
- Download not starting: Verify the Flask server is running and accessible
- Port 5001 in use: Kill existing processes with
lsof -ti:5001 | xargs kill -9
- No sessions visible: Check if any recording sessions have been completed
- Database errors: Ensure SQLite database file has proper permissions
- Duplicate transcripts: Fixed in July 2025 - ensure you're using the latest modular frontend
- Recording not stopping: Fixed in July 2025 - backend now properly responds to stop commands
- Database:
transcripts.db
in project root - Exports: Downloaded to browser's default download folder
- Logs: Console output in terminal running the Flask app
MIT License
Copyright (c) 2024
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
For detailed setup guides, troubleshooting, and development information, see the docs/
directory.