Version: 1.0.0
Description: A comprehensive AI-powered platform combining advanced document processing, video manipulation, live streaming, and transcription services.
MosaicMaster & KΓΆnigstiger is a sophisticated multimedia processing platform that integrates multiple AI-powered services into a unified web application. The platform provides document translation, video processing, live stream management, transcription services, and advanced video playback capabilities.
main-integrated.py: Primary FastAPI application integrating all servicesconfig.py: Central configuration, logging, and utility functions
static/index.html: Main MosaicMaster homepage with multi-functional widgetvideo-player.html: Advanced video player with HLS support and GPU accelerationlive-streams.html: Live stream management interface (KΓΆnigstiger)static/videocutter.html: Video editing interface- JavaScript Modules:
static/js/videoeditor/: Modular video editing componentsstatic/subtitles.js: Subtitle handlingstatic/tryitnow.js: Main interface functionality
- Purpose: Multi-format document conversion and OCR
- Features:
- Document conversion between formats (PDF, DOCX, DOC, ODT, TXT, RTF, PPT, PPTX, EPUB, MOBI)
- OCR for images (JPG, JPEG, PNG, GIF)
- Batch processing capabilities
- External tool integration (Calibre, LibreOffice)
- API Endpoints:
/api/convert,/api/ocr
- Purpose: AI-powered document and text translation
- Features:
- File-based translation with format preservation
- URL content fetching and translation
- Text chunking for large documents
- AI summarization
- 98+ language support
- API Endpoint:
/api/translate
- Purpose: Comprehensive video processing and conversion
- Features:
- Multi-platform video downloads (YouTube, Twitter, Facebook, TikTok, Instagram, Twitch, Vimeo)
- Format conversion and transcoding
- Subtitle generation and translation
- Audio extraction
- Video effects and watermarking
- Video merging and trimming
- API Endpoints:
/api/video/*
- Purpose: Platform-specific video downloading
- Features:
- Multi-platform support with yt-dlp
- Cookie management for authenticated access
- Progress tracking
- Audio-only downloads
- Bot detection circumvention
- Integration: Used by video processor and live stream handler
- Purpose: Audio and video transcription with AI
- Features:
- Whisper-based transcription
- Speaker identification
- Multiple output formats (SRT, VTT, TXT, DOCX, PDF)
- Fast mode for real-time processing
- Post-processing for accuracy improvement
- API Endpoints:
/api/transcribe
- Purpose: Live stream recording and processing (KΓΆnigstiger)
- Features:
- Live stream detection and validation
- Recording with FFmpeg
- HLS proxy streaming
- Real-time transcription
- Stream monitoring and automatic recovery
- API Endpoints:
/api/streams/*
- Purpose: Text-to-speech conversion with AI-powered audio generation
- Features:
- Document text extraction (PDF, DOCX, DOC, ODT, TXT, RTF)
- OCR for image-based text (JPG, JPEG, PNG, GIF)
- URL content fetching and text extraction
- Automatic language detection (Hungarian, English)
- Text chunking for long documents
- Google Text-to-Speech (gTTS) integration
- Multi-part audio generation
- Background processing with job tracking
- API Endpoints:
/api/text-reader/*
- Purpose: GPU-accelerated video playback optimization (LEOPARD engine)
- Features:
- GPU acceleration with NVIDIA hardware
- HLS stream generation
- Multi-format support (AVI, MKV, MP4, WebM)
- Stream copy for compatible formats
- Thumbnail generation
- Media stream analysis
- Integration: Optimized video processing for web playback
- Purpose: Hardware acceleration detection and optimization
- Features:
- NVIDIA GPU detection and configuration
- Hardware encoder selection (h264_nvenc)
- Performance optimization for video processing
branding.py: Application branding and stylingsubtitle_module.py: Subtitle processing and conversionvideocutter.py: Video editing and trimming functionalityvideoeditor.py: Advanced video editing featuresfixed_endpoint.py: API endpoint utilities
-
Document Translation & Conversion
- Translate documents into 98+ languages
- Convert between multiple formats
- OCR text extraction from images
- Batch processing support
-
Video Processing
- Download videos from major platforms
- Convert between video formats
- Extract audio (MP3, WAV, OGG)
- Generate and translate subtitles
- Apply video effects and watermarks
-
Transcription Services
- Audio/video transcription with high accuracy
- Speaker identification and separation
- Multiple output formats
- Real-time transcription capabilities
-
Text-to-Speech Services
- Convert text documents to audio
- Support for multiple document formats (PDF, DOCX, DOC, ODT, TXT, RTF)
- Automatic language detection (Hungarian, English)
- URL content extraction and conversion
- OCR-based text extraction from images
-
Advanced Video Player
- GPU-accelerated playback
- HLS streaming support
- Multiple audio track and subtitle support
- Hardware-optimized encoding
-
Live Stream Management
- YouTube, Twitch, Facebook live stream support
- Real-time recording
- HLS proxy streaming
- Automatic transcription of live content
- Stream monitoring and recovery
- Translation:
POST /api/translate/ - Document Conversion:
POST /api/convert/ - OCR Processing:
POST /api/ocr/ - Video Processing:
POST /api/video/ - Transcription:
POST /api/transcribe/ - Text-to-Speech:
POST /api/text-reader/generate,GET /api/text-reader/status/{job_id},GET /api/text-reader/download/{job_id} - Live Streams:
GET|POST|DELETE /api/streams/ - Video Player:
GET /api/videos/,POST /api/upload/
- Progress Updates:
WS /ws/{connection_id} - Real-time progress tracking for long-running operations
- Processed Files:
GET /download/{filename} - Static Assets:
GET /static/{path} - HLS Streams:
GET /hls/{stream_id}/
mosaicmasternew/
βββ main-integrated.py # Main application entry point
βββ config.py # Configuration and utilities
βββ static/ # Frontend assets
β βββ index.html # Main homepage
β βββ js/videoeditor/ # Video editing modules
β βββ css/ # Stylesheets
β βββ images/ # Static images
βββ templates/ # HTML templates
β βββ video-player.html # Video player interface
β βββ live-streams.html # Live streams interface
βββ Backend Services/
β βββ translator.py # Translation service
β βββ video_processor.py # Video processing
β βββ transcriber.py # Transcription service
β βββ document_processor.py # Document conversion
β βββ videodownloader.py # Video downloading
β βββ live_stream_handler.py # Live streaming
β βββ video_player.py # Video player service
β βββ gpu_acceleration.py # Hardware acceleration
βββ temp/ # Temporary processing files
βββ uploads/ # User uploaded files
βββ converted/ # Processed files and thumbnails
βββ hls/ # HLS streaming content
βββ recordings/ # Live stream recordings
βββ requirements.txt # Python dependencies
- FastAPI: Modern Python web framework
- FFmpeg: Video/audio processing engine
- yt-dlp: Video downloading library
- OpenAI API: AI services (Whisper, GPT)
- WebSocket: Real-time communication
- GPU Acceleration: NVIDIA hardware support
- HLS Streaming: HTTP Live Streaming
- Multi-format Support: Extensive codec support
- Calibre: E-book conversion
- LibreOffice: Office document processing
- Tesseract: OCR engine
- PyMuPDF: PDF processing
-
AI Services Integration
- OpenAI Whisper for transcription
- OpenAI GPT for translation and summarization
- Speaker identification with AI
-
Hardware Optimization
- NVIDIA GPU acceleration (LEOPARD engine)
- Hardware-optimized video encoding
- Performance monitoring and adjustment
-
Multi-Platform Support
- YouTube, Twitter, Facebook, TikTok, Instagram
- Twitch, Vimeo, and generic stream URLs
- Automated platform detection
-
Real-time Processing
- WebSocket progress updates
- Live stream recording and transcription
- Concurrent processing capabilities
- Input validation and sanitization
- Temporary file cleanup
- Rate limiting for API calls
- Secure file handling
- Process isolation for external tools
- Asynchronous Processing: Non-blocking operations
- Background Tasks: Long-running processes
- GPU Acceleration: Hardware-optimized video processing
- Streaming: Efficient large file handling
- Caching: Optimized resource usage
-
Ensure Python 3.8+ is installed:
python3 --version
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate.bat # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Start the application:
./start.sh # or python main-integrated.py -
Open your browser to:
http://localhost:8080
- Homepage:
/- Main MosaicMaster interface - Video Player:
/video-player.html- Advanced video player - Live Streams:
/live-streams.html- Stream management - Video Editor:
/static/videocutter.html- Video editing tools
- FFmpeg: Video and audio processing
- LibreOffice: Document conversion
- Python 3.8+: Runtime environment
- Calibre: E-book conversion
- NVIDIA GPU: Hardware acceleration
- Tesseract: OCR processing
- OpenAI API Key: For transcription and translation
- Configure in environment variables or
.envfile
Β© 2025 MosaicMaster & KΓΆnigstiger. All rights reserved.