A lot of people are now interested in optimizing existing kernels in PyTorch. This audience includes both systems researchers experimenting with new DSLs and LLM researchers looking to automate kernel authoring completely. But many existing efforts have been plagued by how to ensure correctness.
Our take is that if a kernel can replace an existing PyTorch operator and be merged into PyTorch's official codebase then it's far more likely to be correct but hacking on PyTorch's kernels has historically been challenging.
BackendBench is an evaluation suite that tests how good LLMs and humans are at writing a full fledged PyTorch backend. We make it possible for developers to add their custom kernels in well organized directory structure and dynamically override the core PyTorch aten operators at runtime. The outcome is a fully functional readable PyTorch backend you can pip install and run real models on with no model changes!
We provide both
- Comprehensive operator level correctness checks using the PyTorch OpInfo test suite
- Performance checks using the ops that show up in the most popular Hugging Face models with realistic tensor shapes
Install using uv (recommended):
uv add backendbench
Or install in development mode:
uv sync --dev
Run a simple smoke test (relu) with the default ATen backend:
uv run python scripts/main.py --suite smoke --backend aten
Run the smoke test with FlagGems:
uv run python scripts/main.py --suite smoke --backend flag_gems
Run opinfo tests (correctness only) with ATen
uv run python scripts/main.py --suite opinfo --backend aten
Run a filtered set of opinfo tests with FlagGems
uv run python scripts/main.py --suite opinfo --backend flag_gems --ops "add,sub"
Run all the opinfo tests with FlagGems (takes a few minutes)
uv run python scripts/main.py --suite opinfo --backend flag_gems
Generate and evaluate PyTorch kernels using Claude API:
Run LLM evaluation on smoke test (relu operation):
export ANTHROPIC_API_KEY=your_api_key_here
uv run python scripts/main.py --suite smoke --backend llm
Generate and evaluate PyTorch kernels using KernelAgent's advanced system with parallel workers and iterative refinement:
Prerequisites: Initialize the KernelAgent submodule:
git submodule update --init --recursive
Run KernelAgent evaluation on smoke test (relu operation):
export OPENAI_API_KEY=your_api_key_here
uv run python scripts/main.py --suite smoke --backend kernel_agent
Run KernelAgent with custom configuration:
export OPENAI_API_KEY=your_api_key_here
uv run python scripts/main.py --suite smoke --backend kernel_agent --kernel-agent-workers 6 --kernel-agent-max-rounds 15
Run KernelAgent on opinfo tests with a specific operation:
export OPENAI_API_KEY=your_api_key_here
uv run python scripts/main.py --suite opinfo --backend kernel_agent --ops "add"
BackendBench supports a simple directory structure for manually adding kernel implementations. This is perfect for researchers who want to contribute optimized kernels without dealing with complex generation systems.
Create kernels in the following structure:
generated_kernels/
├── relu/
│ └── relu_implementation_1.py
├── add/
│ └── add_implementation_1.py
├── mul/
│ └── mul_implementation_1.py
└── ...
-
Create the operation directory:
mkdir generated_kernels/{op_name}
-
Create your implementation file:
# Example: generated_kernels/relu/relu_implementation_1.py
-
Write your kernel following this template:
import torch def {op_name}_kernel_impl(*args, **kwargs): """ Your kernel implementation. Must match the PyTorch operation signature exactly. """ # Your implementation here return result # Optional: Add a test if __name__ == "__main__": pass
Use these exact directory names for common operations:
relu
→torch.ops.aten.relu.default
add
→torch.ops.aten.add.Tensor
mul
→torch.ops.aten.mul.Tensor
div
→torch.ops.aten.div.Tensor
To find the correct name for other operations:
# Find operation name
import torch
op = torch.ops.aten.some_op.some_variant
print(str(op).split('aten.')[-1].split('.')[0]) # Use this as directory name
Here's a complete example for ReLU:
# generated_kernels/relu/relu_implementation_1.py
import torch
def relu_kernel_impl(input_tensor):
return torch.maximum(input_tensor, torch.zeros_like(input_tensor))
if __name__ == "__main__":
# Test on CPU
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
result = relu_kernel_impl(x)
expected = torch.tensor([0.0, 0.0, 0.0, 1.0, 2.0])
print(f"Test passed: {torch.allclose(result, expected)}")
Test individual implementations:
uv run python generated_kernels/relu/relu_implementation_1.py
Test with BackendBench:
uv run python scripts/main.py --suite smoke --backend directory
Source code is made available under a BSD 3 license