A TypeScript MCP (Model Context Protocol) server that provides comprehensive web search capabilities with multiple tools for different use cases.
- Multi-Engine Web Search: Prioritizes Bing > Brave > DuckDuckGo for optimal reliability and performance
- Full Page Content Extraction: Fetches and extracts complete page content from search results
- Multiple Search Tools: Three specialised tools for different use cases
- Smart Request Strategy: Uses fast axios requests first, then falls back to browser-based extraction if bot detection is encountered
- Concurrent Processing: Extracts content from multiple pages simultaneously
The server provides three specialised tools for different web search needs:
When a comprehensive search is requested, the server uses an optimized search strategy:
- Browser-based Bing Search - Primary method using dedicated Chromium instance
- Browser-based Brave Search - Secondary option using dedicated Firefox instance
- Axios DuckDuckGo Search - Final fallback using traditional HTTP
- Dedicated browser isolation: Each search engine gets its own browser instance with automatic cleanup
- Content extraction: Tries axios first, then falls back to browser with human behavior simulation
- Concurrent processing: Extracts content from multiple pages simultaneously with timeout protection
- HTTP/2 error recovery: Automatically falls back to HTTP/1.1 when protocol errors occur
For quick search results without full content extraction:
- Performs the same optimized multi-engine search as
full-web-search
- Returns only the search result snippets/descriptions
- Does not follow links to extract full page content
For extracting content from a specific webpage:
- Takes a single URL as input
- Follows the URL and extracts the main page content
- Removes navigation, ads, and other non-content elements
This MCP server has been developed and tested with LM Studio. It has not been tested with other MCP clients.
Important: Prioritise using more recent models designated for tool use.
Older models (even those with tool use specified) may not work or may work erratically. This seems to be the case with Llama and Deepseek. Qwen3 and Gemma 3 currently have the best restults.
- ✅ Works well with: Qwen3
- ✅ Works well with: Gemma 3
- ✅ Works with: Llama 3.2
- ✅ Works with: Recent Llama 3.1 (e.g 3.1 swallow-8B)
- ✅ Works with: Recent Deepseek R1 (e.g 0528 works)
⚠️ May have issues with: Some versions of Llama and Deepseek R1- ❌ May not work with: Older versions of Llama and Deepseek R1
Requirements:
- Node.js 18.0.0 or higher
- npm 8.0.0 or higher
- Download the latest release zip file from the Releases page
- Extract the zip file to a location on your system (e.g.,
~/mcp-servers/web-search-mcp/
) - Open a terminal in the extracted folder and run:
This will create a
npm install npx playwright install npm run build
node_modules
folder with all required dependencies, install Playwright browsers, and build the project. - Configure your
mcp.json
to point to the extracteddist/index.js
file:
{
"mcpServers": {
"web-search": {
"command": "node",
"args": ["/path/to/extracted/web-search-mcp/dist/index.js"]
}
}
}
Example paths:
- macOS/Linux:
~/mcp-servers/web-search-mcp/dist/index.js
- Windows:
C:\\mcp-servers\\web-search-mcp\\dist\\index.js
Note: You must run npm install
in the root of the extracted folder (not in dist/
).
Troubleshooting:
- If
npm install
fails, try updating Node.js to version 18+ and npm to version 8+ - If
npm run build
fails, ensure you have the latest Node.js version installed - For older Node.js versions, you may need to use an older release of this project
- Content Length Issues: If you experience odd behavior due to content length limits, try setting
"MAX_CONTENT_LENGTH": "10000"
, or another value, in yourmcp.json
environment variables:
{
"mcpServers": {
"web-search": {
"command": "node",
"args": ["/path/to/web-search-mcp/dist/index.js"],
"env": {
"MAX_CONTENT_LENGTH": "10000",
"BROWSER_HEADLESS": "true",
"MAX_BROWSERS": "3",
"BROWSER_FALLBACK_THRESHOLD": "3"
}
}
}
}
The server supports several environment variables for configuration:
MAX_CONTENT_LENGTH
: Maximum content length in characters (default: 500000)DEFAULT_TIMEOUT
: Default timeout for requests in milliseconds (default: 6000)MAX_BROWSERS
: Maximum number of browser instances to maintain (default: 3)BROWSER_TYPES
: Comma-separated list of browser types to use (default: 'chromium,firefox', options: chromium, firefox, webkit)BROWSER_FALLBACK_THRESHOLD
: Number of axios failures before using browser fallback (default: 3)
ENABLE_RELEVANCE_CHECKING
: Enable/disable search result quality validation (default: true)RELEVANCE_THRESHOLD
: Minimum quality score for search results (0.0-1.0, default: 0.3)FORCE_MULTI_ENGINE_SEARCH
: Try all search engines and return best results (default: false)DEBUG_BROWSER_LIFECYCLE
: Enable detailed browser lifecycle logging for debugging (default: false)
- Optimized timeouts: Default timeout reduced to 6 seconds with concurrent processing for faster results
- Concurrent extraction: Content is now extracted from multiple pages simultaneously
- Reduce timeouts further: Set
DEFAULT_TIMEOUT=4000
for even faster responses (may reduce success rate) - Use fewer browsers: Set
MAX_BROWSERS=1
to reduce memory usage
- Check browser installation: Run
npx playwright install
to ensure browsers are available - Try headless mode: Ensure
BROWSER_HEADLESS=true
(default) for server environments - Network restrictions: Some networks block browser automation - try different network or VPN
- HTTP/2 issues: The server automatically handles HTTP/2 protocol errors with fallback to HTTP/1.1
- Enable quality checking: Set
ENABLE_RELEVANCE_CHECKING=true
(enabled by default) - Adjust quality threshold: Set
RELEVANCE_THRESHOLD=0.5
for stricter quality requirements - Force multi-engine search: Set
FORCE_MULTI_ENGINE_SEARCH=true
to try all engines and return the best results
- Automatic cleanup: Browsers are automatically cleaned up after each operation to prevent memory leaks
- Limit browsers: Reduce
MAX_BROWSERS
(default: 3) - EventEmitter warnings: Fixed - browsers are properly closed to prevent listener accumulation
git clone https://github.com/mrkrsl/web-search-mcp.git
cd web-search-mcp
npm install
npx playwright install
npm run build
npm run dev # Development with hot reload
npm run build # Build TypeScript to JavaScript
npm run lint # Run ESLint
npm run format # Run Prettier
Add to your mcp.json
:
{
"mcpServers": {
"web-search": {
"command": "node",
"args": ["/path/to/web-search-mcp/dist/index.js"]
}
}
}
This server provides three specialised tools for different web search needs:
The most comprehensive web search tool that:
- Takes a search query and optional number of results (1-10, default 5)
- Performs a web search (tries Bing, then Brave, then DuckDuckGo if needed)
- Fetches full page content from each result URL with concurrent processing
- Returns structured data with search results and extracted content
- Enhanced reliability: HTTP/2 error recovery, reduced timeouts, and better error handling
Example Usage:
{
"name": "full-web-search",
"arguments": {
"query": "TypeScript MCP server",
"limit": 3,
"includeContent": true
}
}
A lightweight alternative for quick search results:
- Takes a search query and optional number of results (1-10, default 5)
- Performs the same optimized multi-engine search as
full-web-search
- Returns only search result snippets/descriptions (no content extraction)
- Faster and more efficient for quick research
Example Usage:
{
"name": "get-web-search-summaries",
"arguments": {
"query": "TypeScript MCP server",
"limit": 5
}
}
A utility tool for extracting content from a specific webpage:
- Takes a single URL as input
- Follows the URL and extracts the main page content
- Removes navigation, ads, and other non-content elements
- Useful for getting detailed content from a known webpage
Example Usage:
{
"name": "get-single-web-page-content",
"arguments": {
"url": "https://example.com/article",
"maxContentLength": 5000
}
}
You can also run the server directly:
# If running from source
npm start
See API.md for complete technical details.
MIT License - see LICENSE for details.
This is an open source project and we welcome feedback! If you encounter any issues or have suggestions for improvements, please:
- Open an issue on GitHub
- Submit a pull request