A Model Context Protocol (MCP) server that provides AI assistants with intelligent screen monitoring, UI interaction, and real-time visual analysis capabilities.
- Intelligent Triggers: Event-driven monitoring with 6 smart triggers (significant_change, error_detected, new_window, application_change, code_change, text_appears)
- Adaptive Performance: Automatically adjusts FPS based on screen activity
- Event Classification: Intelligent categorization of screen events
- Smart Click: Natural language UI interaction ("Click the save button")
- OCR Text Extraction: Extract text from screen regions with coordinates
- Element Detection: Identify and interact with UI elements
- 75% Success Rate: Enhanced fuzzy matching and position-based scoring
- Context Awareness: Track active applications and window changes
- Event Broadcasting: Real-time application event notifications
- Multi-Application Support: Works with any desktop application
- Screen Recording: Capture screen activity with configurable FPS
- AI Analysis: Multiple analysis types (summary, frame-by-frame, key moments)
- Format Support: Save recordings in various formats
start_smart_monitoring()- Begin intelligent trigger-based monitoringstop_smart_monitoring()- Stop smart monitoringget_monitoring_insights()- Get AI-powered analysis insightsget_recent_events()- Retrieve recent smart events with detailsget_monitoring_summary()- Get comprehensive monitoring session report
smart_click()- Natural language UI interaction ("Click the save button")extract_text_from_screen()- OCR text extraction with coordinatesanalyze_ui_elements()- Detect and analyze UI elements
capture_and_analyze()- Screen capture with AI analysisrecord_and_analyze()- Video recording with AI analysisget_active_application()- Get current application contextlist_tools()- List all available tools
-
Clone and install dependencies
git clone https://github.com/inkbytefo/ScreenMonitorMCP.git cd ScreenMonitorMCP pip install -r requirements.txt -
Configure environment
cp .env.example .env # Edit .env with your API keys -
Run the server
python main.py
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"screenMonitorMCP": {
"command": "python",
"args": ["/path/to/ScreenMonitorMCP/main.py"],
"cwd": "/path/to/ScreenMonitorMCP"
}
}
}Smart Monitoring
# Start intelligent monitoring
await start_smart_monitoring(
triggers=["significant_change", "error_detected", "new_window"],
analysis_prompt="What changed on screen and why is it important?",
fps=2,
sensitivity="medium"
)
# Get insights
insights = await get_monitoring_insights()UI Interaction
# Natural language clicking
await smart_click("Save button")
await smart_click("File menu")
# Extract text from regions
text_data = await extract_text_from_screen(
region={"x": 100, "y": 100, "width": 300, "height": 200}
)Video Analysis
# Record and analyze screen activity
video_result = await record_and_analyze(
duration=15,
fps=2,
analysis_type="summary",
analysis_prompt="What happened in this recording?",
save_video=True
)Create a .env file with the following configuration:
# Server Configuration
HOST=127.0.0.1
PORT=7777
API_KEY=your_secret_key
# AI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1
DEFAULT_OPENAI_MODEL=gpt-4o
DEFAULT_MAX_TOKENS=1000With Environment Variables
{
"mcpServers": {
"screenMonitorMCP": {
"command": "python",
"args": ["/path/to/ScreenMonitorMCP/main.py"],
"cwd": "/path/to/ScreenMonitorMCP",
"env": {
"OPENAI_API_KEY": "your-api-key-here"
}
}
}
}With API Key Security
{
"mcpServers": {
"screenMonitorMCP": {
"command": "python",
"args": [
"/path/to/ScreenMonitorMCP/main.py",
"--api-key", "your-secret-key"
],
"cwd": "/path/to/ScreenMonitorMCP"
}
}
}Windows Configuration
{
"mcpServers": {
"screenMonitorMCP": {
"command": "C:/Python311/python.exe",
"args": ["C:/path/to/ScreenMonitorMCP/main.py"],
"cwd": "C:/path/to/ScreenMonitorMCP"
}
}
}Monitor application changes and events in real-time, perfect for development workflows and debugging.
Analyze screen content with AI to understand context, detect errors, and provide intelligent insights.
Enable AI assistants to interact with desktop applications using natural language commands.
Record and analyze user interactions for automated testing and quality assurance workflows.
- Python 3.9+
- OpenAI API key (for AI analysis)
- Windows/macOS/Linux support
Key dependencies include:
fastmcp- MCP server frameworkpillow- Image processingeasyocr- Text extractionopencv-python- Video recordingopenai- AI analysispsutil- System monitoring
- Smart Triggers: Only analyzes when meaningful events occur
- Adaptive FPS: Automatically adjusts monitoring speed (1-5 FPS)
- 75% Success Rate: Enhanced UI element detection and interaction
- Memory Efficient: Event-driven architecture minimizes resource usage
Unicode/Encoding Error (Windows)
UnicodeEncodeError: 'charmap' codec can't encode character
Solution: Fixed automatically - server uses UTF-8 encoding.
JSON Configuration Error
// β Wrong - trailing comma
{
"command": "python",
"args": ["path/to/main.py",]
}
// β
Correct
{
"command": "python",
"args": ["path/to/main.py"]
}Python Path Issue Use full Python path if needed:
{
"command": "C:/Python311/python.exe",
"args": ["C:/path/to/ScreenMonitorMCP/main.py"]
}Missing Dependencies
cd ScreenMonitorMCP
pip install -r requirements.txtContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for version history and updates.
ScreenMonitorMCP - Giving AI assistants intelligent vision and interaction capabilities through the Model Context Protocol.