|  | 
|  | 1 | +# MCPMark Community Experiments | 
|  | 2 | + | 
|  | 3 | +A community-driven repository for evaluating and benchmarking MCP (Model Context Protocol) servers and agent frameworks using the MCPMark framework. This project has two main purposes: | 
|  | 4 | + | 
|  | 5 | +1. **Benchmark different MCP Server implementations** under the same model to compare their performance and capabilities | 
|  | 6 | +2. **Benchmark different agent frameworks** to evaluate their effectiveness in working with MCP servers | 
|  | 7 | + | 
|  | 8 | +All evaluations are conducted using the MCPMark to ensure consistent and comparable results. The project aggregates evaluation results to provide comprehensive performance metrics and insights across both dimensions. | 
|  | 9 | + | 
|  | 10 | +## 📊 Current Results | 
|  | 11 | + | 
|  | 12 | +The evaluation results are automatically aggregated and available in `mcp_servers.json`. This file contains: | 
|  | 13 | + | 
|  | 14 | +- Performance metrics (pass@1, pass@k rates) | 
|  | 15 | +- Token usage and cost analysis   | 
|  | 16 | +- Execution time statistics | 
|  | 17 | +- Metadata for each MCP server implementation | 
|  | 18 | + | 
|  | 19 | +## 🏗️ Repository Structure | 
|  | 20 | + | 
|  | 21 | +``` | 
|  | 22 | +mcp_servers/ | 
|  | 23 | +├── github/ | 
|  | 24 | +│   ├── your-mcp-server/          # Your GitHub MCP Server | 
|  | 25 | +│   └── official/                 # GitHub's Official MCP Server | 
|  | 26 | +└── notion/ | 
|  | 27 | +    └── official/                 # Notion's Official MCP Server | 
|  | 28 | +``` | 
|  | 29 | + | 
|  | 30 | +Each server directory contains: | 
|  | 31 | +- `meta.json` - Server metadata (author, description, homepage, etc.) | 
|  | 32 | +- `run-1/`, `run-2/`, etc. - Evaluation run results | 
|  | 33 | + | 
|  | 34 | +## 🤝 Contributing | 
|  | 35 | + | 
|  | 36 | +We welcome contributions from the MCP community! You can help by: | 
|  | 37 | + | 
|  | 38 | +### Adding New MCP Servers | 
|  | 39 | + | 
|  | 40 | +1. **Fork this repository** | 
|  | 41 | + | 
|  | 42 | +2. **Add your MCP server directory structure:** | 
|  | 43 | +   ``` | 
|  | 44 | +   mcp_servers/ | 
|  | 45 | +   └── your-server-name/ | 
|  | 46 | +       └── your-implementation/ | 
|  | 47 | +           ├── meta.json | 
|  | 48 | +           └── run-*/ | 
|  | 49 | +               ├── summary.json | 
|  | 50 | +               └── task-results/ | 
|  | 51 | +   ``` | 
|  | 52 | + | 
|  | 53 | +3. **Create `meta.json` with your server information:** | 
|  | 54 | +   ```json | 
|  | 55 | +   { | 
|  | 56 | +     "author": { | 
|  | 57 | +       "name": "Your Name/Organization", | 
|  | 58 | +       "url": "https://your-website.com" | 
|  | 59 | +     }, | 
|  | 60 | +     "avatar": "https://your-avatar-url.com/avatar.png", | 
|  | 61 | +     "description": "Description of your MCP server and its capabilities", | 
|  | 62 | +     "homepage": "https://your-server-homepage.com", | 
|  | 63 | +     "name": "Your MCP Server Name" | 
|  | 64 | +   } | 
|  | 65 | +   ``` | 
|  | 66 | + | 
|  | 67 | +4. **Include evaluation results** following the established format | 
|  | 68 | + | 
|  | 69 | +5. **Submit a Pull Request** with: | 
|  | 70 | +   - Clear description of your MCP server | 
|  | 71 | +   - Link to the server's repository/documentation | 
|  | 72 | +   - Brief explanation of evaluation methodology used | 
|  | 73 | + | 
|  | 74 | +### Improving Evaluation Methods | 
|  | 75 | + | 
|  | 76 | +- Suggest new evaluation metrics or benchmarks | 
|  | 77 | +- Improve the aggregation scripts | 
|  | 78 | +- Add analysis tools and visualizations | 
|  | 79 | +- Report issues or suggest improvements | 
|  | 80 | + | 
|  | 81 | +### Guidelines | 
|  | 82 | + | 
|  | 83 | +- Ensure your evaluation data is reproducible | 
|  | 84 | +- Follow the existing directory structure | 
|  | 85 | +- Include comprehensive metadata | 
|  | 86 | +- Test that your additions work with the aggregation script | 
|  | 87 | + | 
|  | 88 | +## 📋 Evaluation Metrics | 
|  | 89 | + | 
|  | 90 | +The aggregation includes: | 
|  | 91 | + | 
|  | 92 | +- **Pass@k rates**: Success rate across multiple evaluation runs | 
|  | 93 | +- **Token usage**: Input/output token consumption and costs | 
|  | 94 | +- **Execution time**: Agent performance timing | 
|  | 95 | +- **Task completion**: Success rates for different task types | 
|  | 96 | + | 
|  | 97 | +## 🚀 Getting Started | 
|  | 98 | + | 
|  | 99 | +1. Clone the repository | 
|  | 100 | +2. Explore existing MCP server results in `mcp_servers.json` | 
|  | 101 | +3. Check individual server directories for detailed results | 
|  | 102 | +4. Consider contributing your own MCP server evaluations! | 
0 commit comments