Skip to content

Euraxluo/browser-mcp

Repository files navigation

Session-Based Browser-Use FastMCP Server

CI codecov

English | 中文

English

A modern Model Context Protocol (MCP) server that provides advanced browser automation capabilities using the FastMCP framework. Features session-based instance management, TTL cleanup, PDF generation, file downloads, cookie management, and comprehensive browser configuration options. All browser operations are implemented via browser-use.

🎯 Key Features

  • Session-Based Management: Each MCP session gets its own isolated browser instance automatically
  • Advanced Browser Control: Full browser automation with Playwright backend (via browser-use)
  • PDF Generation: Convert web pages to PDF with custom formatting options
  • File Operations: Download/upload files, manage file system, and access all temp files
  • Cookie Management: Set, get, and manage browser cookies for authentication
  • Screenshot Capture: Take full-page, viewport, or element screenshots
  • Tab Management: Create, switch, and close browser tabs
  • Content Extraction: Extract and search page content
  • Session Persistence: Automatic cleanup with configurable TTL
  • Multi-Instance Support: Run multiple isolated browser sessions
  • Configurable Security: All browser security settings are configurable via API

🚀 Quick Start

  1. Install Dependencies:

    Using uv (recommended):

    uv sync --all-extras
  2. Install the Browser:

    uv run playwright install --with-deps chromium
  3. Start the Server:

    Using uv (recommended):

    uv run main.py
  4. Basic Usage (Direct SessionBrowserManager):

    # Direct usage without MCP protocol (for testing/development)
    from browser_fastmcp_server import SessionBrowserManager, BrowserConfig
    import asyncio
    
    async def main():
        # Create session manager
        manager = SessionBrowserManager(max_instances=5, default_ttl=300)
        await manager.start_cleanup_task()
        
        # Create a new browser session
        session_id = "test_session_123"
        instance = await manager.get_or_create_session_instance(
            session_id, 
            BrowserConfig(headless=True)
        )
        
        # Navigate to a website
        browser_session = instance.browser_session
        await browser_session.navigate("https://example.com")
        
        # Get page elements
        state_summary = await browser_session.get_state_summary(cache_clickable_elements_hashes=True)
        print(f"Interactive elements: {len(state_summary.selector_map)}")
        
        # Take a screenshot
        page = await browser_session.get_current_page()
        screenshot_bytes = await page.screenshot(full_page=True)
        
        # Close session when done
        await manager.close_session(session_id)
        await manager.shutdown()
    
    if __name__ == "__main__":
        asyncio.run(main())

🛠️ Run Tests

Install test dependencies and run all tests:

uv run python -m pytest test_browser_workflow_test.py test_browser_fastmcp_client.py test_browser_test.py -v

🛠️ Core Tools (API)

Session Management

  • create_chrome_instance(headless, viewport_width, viewport_height) → Create a new browser session, returns session_id
  • close_instance(session_id) → Close a specific session
  • get_instance_info(session_id) → Get info for a session
  • check_browser_health(session_id) → Check the health status of a browser session and provide recovery suggestions
  • get_browser_status() → List all sessions
  • close_all_instances() → Close all sessions

Browser Configuration

  • set_browser_config(session_id, headless, no_sandbox, user_agent, viewport_width, viewport_height, disable_web_security) → Set browser config (restart if needed)
  • get_browser_config(session_id) → Get current config

Navigation & Page Control

  • navigate_to(session_id, url, new_tab=False) → Go to any URL (optionally in new tab)
  • navigate_back(session_id) / navigate_forward(session_id) → History navigation
  • refresh_page(session_id) → Refresh the current page
  • get_page_state(session_id) → List interactive elements with indices

Tab Management

  • get_tabs_info(session_id) → List all open tabs
  • switch_tab(session_id, page_id) → Switch between tabs
  • close_tab(session_id, page_id) → Close specific tab

Element Interaction

  • click_element(session_id, index) → Click element by index
  • click_element_by_xpath(session_id, xpath) → Click element by XPath
  • input_text(session_id, index, text) → Type into form fields
  • set_element_value(session_id, index, value) → Set input/select value directly
  • get_element_info(session_id, index=None, xpath=None) → Get element info (by index or xpath)
  • send_keys(session_id, keys) → Send keyboard shortcuts
  • upload_file(session_id, index, file_path) → Upload files to forms
  • get_dropdown_options(session_id, index) → Inspect select elements

Media & Files

  • take_screenshot(session_id, target=None, width=None, height=None, full_page=True, quality=90, format="png") → Capture screenshots
  • generate_pdf(session_id, url=None, html_content=None, output_filename=None, ...) → Save page as PDF
  • download_file(session_id, url, output_filename=None, timeout=30) → Download files from URLs
  • download_image(session_id, image_url, output_filename=None, timeout=30) → Download images specifically

Cookie & Session Management

  • set_cookie(session_id, name, value, domain, path, http_only, secure, same_site, expires, max_age) → Set browser cookies
  • get_cookies(session_id, domain=None) → Retrieve current cookies

Utilities

  • scroll_page(session_id, direction="down") → Scroll up/down
  • extract_content(session_id, query) → Extract text content
  • wait(seconds) → Pause execution
  • browser_tips() → Get automation best practices
  • search_bing(session_id, query) → Bing search

📚 Resources (REST-style)

  • browser://status → Manager and sessions status
  • browser://instances → All sessions info
  • browser://instance/{id}/page → Session page info
  • browser://instance/{id}/tabs → Session tabs
  • browser://instance/{id}/screenshots → Session screenshots
  • browser://instance/{id}/status → Session status (detailed)
  • browser://instance/{id}/files → Session temp files
  • browser://instance/{id}/cookies → Session cookies
  • browser://instance/{id}/file/{relative_path} → Read a file in session temp
  • browser://help → This help

🔧 Configuration

Configure the server using environment variables:

# Maximum number of concurrent browser instances
BROWSER_MAXIMUM_INSTANCES=10

# Session TTL in seconds (default: 30 minutes)
BROWSER_INSTANCE_TTL=1800

# Command execution timeout in seconds
BROWSER_EXECUTE_TIMEOUT=30

# Cleanup interval in seconds
BROWSER_CLEANUP_INTERVAL=60

📝 Prompts

Built-in prompts for common automation scenarios:

  • web_testing(url, test_scenario) → Web testing workflows
  • data_extraction(url, data_type) → Data extraction strategies
  • form_filling(url, form_data) → Automated form filling (returns conversation)
  • automation_troubleshooting() → Debugging help

🔌 MCP Integration

Using with Claude Desktop

  1. Add to Claude Desktop Configuration:

    Edit your Claude Desktop configuration file (usually at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

    {
      "mcpServers": {
        "browser-mcp": {
          "command": "uv",
          "args": ["run", "fastmcp", "run", "/path/to/browser-mcp/browser_fastmcp_server.py"],
          "env": {
            "BROWSER_MAXIMUM_INSTANCES": "5",
            "BROWSER_INSTANCE_TTL": "1800"
          }
        }
      }
    }
  2. Restart Claude Desktop to load the MCP server

  3. Start Using: The browser automation tools will now be available in your Claude conversations

Using with MCP Client (Two Ways)

Method 1: Network-based MCP Client (via HTTP/SSE)

import asyncio
from mcp import ClientSession, SSEClientTransport

async def main():
    # Connect to the running server via network
    transport = SSEClientTransport("http://localhost:8000/sse")
    
    async with ClientSession(transport) as session:
        # Initialize session
        await session.initialize()
        
        # Start browser
        info = await session.call_tool("create_chrome_instance", {"headless": True})
        session_id = info["session_id"]
        
        # Navigate to website
        await session.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
        
        # Take screenshot
        await session.call_tool("take_screenshot", {"session_id": session_id})
        
        # Close session
        await session.call_tool("close_instance", {"session_id": session_id})

if __name__ == "__main__":
    asyncio.run(main())

Method 2: Direct Client (No Network)

import asyncio
from fastmcp import Client
from browser_fastmcp_server import mcp as browsers_mcp

async def main():
    # Direct client connection (no network)
    client = Client(browsers_mcp)
    
    async with client:
        # Start browser
        session = await client.call_tool("create_chrome_instance", {"headless": True})
        session_id = session.data.session_id
        
        # Navigate to website
        await client.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
        
        # Take screenshot
        await client.call_tool("take_screenshot", {"session_id": session_id})
        
        # Close session
        await client.call_tool("close_instance", {"session_id": session_id})

if __name__ == "__main__":
    asyncio.run(main())

🔒 Authentication

For server deployments requiring authentication, modify main.py to set an AuthProvider before startup:

Basic Authentication:

from fastmcp.auth import BasicAuth

# Add this before mcp.run()
mcp.auth = BasicAuth(username="admin", password="password")

JWT Authentication (Recommended for Production):

For more advanced authentication, we recommend using fastmcp-authentication:

from fastmcp_authentication import BearerAuthProvider

JWKS_URI = "http://localhost:8080/.well-known/jwks.json"
auth = BearerAuthProvider(
    jwks_uri=JWKS_URI,
    issuer="http://localhost:8080",
    audience="localhost:8080",
    algorithm="RS256"
)

mcp.auth = auth

💡 Use Cases

  • Web Testing: Automated functional, security, and performance testing
  • Data Scraping: Extract structured data from websites
  • Form Automation: Fill and submit web forms programmatically
  • Content Monitoring: Track changes in web content
  • Screenshot Documentation: Capture visual evidence for reports
  • PDF Generation: Convert web pages to PDF documents
  • Session Management: Handle authenticated workflows

🔒 Security Features

  • Session isolation between MCP clients
  • Secure cookie management with HttpOnly and Secure flags
  • Configurable browser security settings (CORS, sandbox, etc.)
  • Automatic cleanup of temporary files
  • TTL-based session expiration

🐳 Docker Usage

Build the image:

docker build -t browser-mcp .

Run the server (default: port 8000, SSE transport):

docker run -p 8000:8000 browser-mcp

You can override startup parameters via environment variables:

docker run -e MCP_PORT=9000 -e MCP_TRANSPORT=http -e MCP_HOST=127.0.0.1 -p 9000:9000 browser-mcp

Chinese

基于会话的浏览器自动化 FastMCP 服务器,提供先进的浏览器自动化功能,使用 FastMCP 框架构建。所有浏览器操作均通过 browser-use 实现。

🎯 核心特性

  • 基于会话的管理: 每个 MCP 会话自动获得独立的浏览器实例
  • 高级浏览器控制: 基于 Playwright 的完整浏览器自动化(由 browser-use 提供)
  • PDF 生成: 将网页转换为 PDF,支持自定义格式选项
  • 文件操作: 下载/上传文件,管理临时文件目录
  • Cookie 管理: 设置、获取和管理浏览器 Cookie 用于身份验证
  • 截图捕获: 全页面、视口或元素截图
  • 标签页管理: 创建、切换和关闭浏览器标签页
  • 内容提取: 提取和搜索页面内容
  • 会话持久化: 自动清理,可配置 TTL
  • 多实例支持: 运行多个隔离的浏览器会话
  • 可配置安全性: 所有浏览器安全设置均可通过 API 配置

🚀 快速开始

  1. 安装依赖:

    使用 uv(推荐):

    uv sync --all-extras
  2. 安装浏览器:

    uv run playwright install --with-deps chromium
  3. 启动服务器:

    使用 uv(推荐):

    uv run main.py
  4. 基本使用(直接使用 SessionBrowserManager):

    # 直接使用,不通过 MCP 协议(用于测试/开发)
    from browser_fastmcp_server import SessionBrowserManager, BrowserConfig
    import asyncio
    
    async def main():
        # 创建会话管理器
        manager = SessionBrowserManager(max_instances=5, default_ttl=300)
        await manager.start_cleanup_task()
        
        # 创建新浏览器会话
        session_id = "test_session_123"
        instance = await manager.get_or_create_session_instance(
            session_id, 
            BrowserConfig(headless=True)
        )
        
        # 导航到网站
        browser_session = instance.browser_session
        await browser_session.navigate("https://example.com")
        
        # 获取页面元素
        state_summary = await browser_session.get_state_summary(cache_clickable_elements_hashes=True)
        print(f"交互元素: {len(state_summary.selector_map)}")
        
        # 截图
        page = await browser_session.get_current_page()
        screenshot_bytes = await page.screenshot(full_page=True)
        
        # 完成后关闭会话
        await manager.close_session(session_id)
        await manager.shutdown()
    
    if __name__ == "__main__":
        asyncio.run(main())

🛠️ 运行测试

安装测试依赖并运行所有测试:

uv run python -m pytest test_browser_workflow_test.py test_browser_fastmcp_client.py test_browser_test.py -v

🛠️ 核心工具(API)

会话管理

  • create_chrome_instance(headless, viewport_width, viewport_height) → 创建新浏览器会话,返回 session_id
  • close_instance(session_id) → 关闭指定会话
  • get_instance_info(session_id) → 获取会话信息
  • check_browser_health(session_id) → 检查浏览器会话的健康状态并提供恢复建议
  • get_browser_status() → 列出所有会话
  • close_all_instances() → 关闭所有会话

浏览器配置

  • set_browser_config(session_id, headless, no_sandbox, user_agent, viewport_width, viewport_height, disable_web_security) → 设置浏览器配置(如需重启自动重启)
  • get_browser_config(session_id) → 获取当前配置

导航和页面控制

  • navigate_to(session_id, url, new_tab=False) → 导航到 URL(可选新标签页)
  • navigate_back(session_id) / navigate_forward(session_id) → 历史记录导航
  • refresh_page(session_id) → 刷新当前页面
  • get_page_state(session_id) → 获取带索引的交互元素

标签页管理

  • get_tabs_info(session_id) → 列出所有打开的标签页
  • switch_tab(session_id, page_id) → 切换标签页
  • close_tab(session_id, page_id) → 关闭指定标签页

元素交互

  • click_element(session_id, index) → 按索引点击元素
  • click_element_by_xpath(session_id, xpath) → 按 XPath 点击元素
  • input_text(session_id, index, text) → 在表单字段中输入文本
  • set_element_value(session_id, index, value) → 直接设置输入/选择值
  • get_element_info(session_id, index=None, xpath=None) → 获取元素信息(按索引或 xpath)
  • send_keys(session_id, keys) → 发送键盘快捷键
  • upload_file(session_id, index, file_path) → 上传文件到表单
  • get_dropdown_options(session_id, index) → 检查 select 元素

媒体和文件

  • take_screenshot(session_id, target=None, width=None, height=None, full_page=True, quality=90, format="png") → 截图
  • generate_pdf(session_id, url=None, html_content=None, output_filename=None, ...) → 保存页面为 PDF
  • download_file(session_id, url, output_filename=None, timeout=30) → 下载文件
  • download_image(session_id, image_url, output_filename=None, timeout=30) → 下载图片

Cookie 和会话管理

  • set_cookie(session_id, name, value, domain, path, http_only, secure, same_site, expires, max_age) → 设置 Cookie
  • get_cookies(session_id, domain=None) → 获取当前 Cookie

实用工具

  • scroll_page(session_id, direction="down") → 上下滚动
  • extract_content(session_id, query) → 提取文本内容
  • wait(seconds) → 暂停执行
  • browser_tips() → 获取自动化最佳实践
  • search_bing(session_id, query) → Bing 搜索

📚 资源(REST 风格)

  • browser://status → 管理器和会话状态
  • browser://instances → 所有会话信息
  • browser://instance/{id}/page → 会话页面信息
  • browser://instance/{id}/tabs → 会话标签页
  • browser://instance/{id}/screenshots → 会话截图
  • browser://instance/{id}/status → 会话详细状态
  • browser://instance/{id}/files → 会话临时文件
  • browser://instance/{id}/cookies → 会话 Cookie
  • browser://instance/{id}/file/{relative_path} → 读取会话临时文件
  • browser://help → 帮助

🔧 配置

使用环境变量配置服务器:

# 最大并发浏览器实例数
BROWSER_MAXIMUM_INSTANCES=10

# 会话 TTL(秒)(默认:30分钟)
BROWSER_INSTANCE_TTL=1800

# 命令执行超时(秒)
BROWSER_EXECUTE_TIMEOUT=30

# 清理间隔(秒)
BROWSER_CLEANUP_INTERVAL=60

📝 提示

常见自动化场景的内置 prompt:

  • web_testing(url, test_scenario) → Web 测试工作流
  • data_extraction(url, data_type) → 数据提取策略
  • form_filling(url, form_data) → 自动表单填写(返回对话)
  • automation_troubleshooting() → 调试帮助

🔌 MCP 集成

与 Claude Desktop 一起使用

  1. 添加到 Claude Desktop 配置:

    编辑 Claude Desktop 配置文件(macOS 上通常位于 ~/Library/Application Support/Claude/claude_desktop_config.json):

    {
      "mcpServers": {
        "browser-mcp": {
          "command": "uv",
          "args": ["run", "fastmcp", "run", "/path/to/browser-mcp/browser_fastmcp_server.py"],
          "env": {
            "BROWSER_MAXIMUM_INSTANCES": "5",
            "BROWSER_INSTANCE_TTL": "1800"
          }
        }
      }
    }
  2. 重启 Claude Desktop 以加载 MCP 服务器

  3. 开始使用: 浏览器自动化工具现在可在您的 Claude 对话中使用

与 MCP 客户端一起使用(两种方式)

方式一:基于网络的 MCP 客户端(通过 HTTP/SSE)

import asyncio
from mcp import ClientSession, SSEClientTransport

async def main():
    # 通过网络连接到运行的服务器
    transport = SSEClientTransport("http://localhost:8000/sse")
    
    async with ClientSession(transport) as session:
        # 初始化会话
        await session.initialize()
        
        # 启动浏览器
        info = await session.call_tool("create_chrome_instance", {"headless": True})
        session_id = info["session_id"]
        
        # 导航到网站
        await session.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
        
        # 截图
        await session.call_tool("take_screenshot", {"session_id": session_id})
        
        # 关闭会话
        await session.call_tool("close_instance", {"session_id": session_id})

if __name__ == "__main__":
    asyncio.run(main())

方式二:直接客户端(无网络)

import asyncio
from fastmcp import Client
from browser_fastmcp_server import mcp as browsers_mcp

async def main():
    # 直接客户端连接(无网络)
    client = Client(browsers_mcp)
    
    async with client:
        # 启动浏览器
        session = await client.call_tool("create_chrome_instance", {"headless": True})
        session_id = session.data.session_id
        
        # 导航到网站
        await client.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
        
        # 截图
        await client.call_tool("take_screenshot", {"session_id": session_id})
        
        # 关闭会话
        await client.call_tool("close_instance", {"session_id": session_id})

if __name__ == "__main__":
    asyncio.run(main())

🔒 身份验证

对于需要身份验证的服务器部署,在启动前修改 main.py 设置 AuthProvider:

基本身份验证:

from fastmcp.auth import BasicAuth

# 在 mcp.run() 之前添加
mcp.auth = BasicAuth(username="admin", password="password")

JWT 身份验证(生产环境推荐):

对于更高级的身份验证,我们推荐使用 fastmcp-authentication

from fastmcp_authentication import BearerAuthProvider

JWKS_URI = "http://localhost:8080/.well-known/jwks.json"
auth = BearerAuthProvider(
    jwks_uri=JWKS_URI,
    issuer="http://localhost:8080",
    audience="localhost:8080",
    algorithm="RS256"
)

mcp.auth = auth

💡 使用场景

  • Web 测试: 自动化功能、安全和性能测试
  • 数据抓取: 从网站提取结构化数据
  • 表单自动化: 程序化填写和提交 Web 表单
  • 内容监控: 跟踪 Web 内容变化
  • 截图文档: 为报告捕获视觉证据
  • PDF 生成: 将网页转换为 PDF 文档
  • 会话管理: 处理身份验证工作流

🔒 安全功能

  • MCP 客户端之间的会话隔离
  • 支持 HttpOnly 和 Secure 标志的安全 Cookie 管理
  • 可配置的浏览器安全设置(CORS、沙箱等)
  • 临时文件自动清理
  • 基于 TTL 的会话过期

🐳 Docker 用法

构建镜像:

docker build -t browser-mcp .

运行服务(默认8000端口,SSE模式):

docker run -p 8000:8000 browser-mcp

可通过环境变量覆盖启动参数:

docker run -e MCP_PORT=9000 -e MCP_TRANSPORT=http -e MCP_HOST=127.0.0.1 -p 9000:9000 browser-mcp

About

browser-mcp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published