Pitting AI models against each other in real-world coding challenges.
This repository hosts a collection of benchmarks designed to evaluate how well different AI models perform on practical programming tasks.
| Benchmark | Category | Description |
|---|---|---|
| 1 Billion Row Challenge | Performance | Process 1B temperature readings as fast as possible |
| Project Euler | Reasoning/Algorithm | Solve mathematical and programming problems |
agentic-benchmarks/
βββ README.md # This file
βββ 1brc/ # 1 Billion Row Challenge
βββ projecteuler/ # Project Euler Challenge
βββ ...
Each benchmark has its own directory with setup instructions, prompts, implementations, and results.