Skip to content

HPAI-BSC/TuRTLe

Repository files navigation

HPAI


TuRTLe is a framework to assess LLMs across key RTL generation tasks systematically. It integrates multiple existing benchmarks and automates the evaluation process, enabling a comprehensive assessment of LLM performance in syntax correctness, functional correctness, synthesis, PPA optimization, and exact line completion.

This work extends the functionality and flexibility of bigcode-evaluation-harness with the use of open-source EDA tools to run Specification-to-RTL and RTL Code Completion benchmarks. Furthermore, it is inspired from vllm-code-harness to allow an efficient inference with vLLM.

Benchmarks implemented so far are:

Open-source EDA tools integrated:

For more details about our work, refer to our ArXiv paper. Here you have a diagram of the high-level structure of the framework: TuRTLe diagram

News

  • [2025-07-03] TuRTLe now supports Verilator as a simulator to check for Syntax and Functionality
  • [2025-06-12] We add support for multi-node inference with Ray and the configurations for bigger models
  • [2025-05-19] The project’s source code is now publicly released. We’d love to hear your feedback, so give it a try!
  • [2025-03-31] Our paper "TuRTLe: A Unified Evaluation of LLMs for RTL Generation" is now available on ArXiv!
  • [2025-03-20] The leaderboard is now live! Check it out on our Huggingface Space

Road Map

  • [In progress] Release repo compatible with local execution

Leaderboard 🥇

Check the TuRTLe Leaderboard to know the best open-source models for each task. Leaderboard screenshot

Usage

Warning

Dependencies Notice
vLLM currently supports up to Python 3.12. Ensure that your Python version does not exceed this limit to avoid compatibility issues.

HPC Environment Requirements

Most of the modes require to be executed in HPC environments. For this reason, TuRTLe currently relies on Slurm and Singularity for its execution.

Installation

  1. Clone the repository:

    git clone --recursive https://github.com/HPAI-BSC/TuRTLe.git
  2. (Optional) Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
  3. Install Python dependencies:

    pip install -r requirements.txt

    On non-Linux devices the above command will raise:

    AssertionError: vLLM only supports Linux platform (including WSL).​
    

    In this case, vLLM has to be installed from source (see their installation page for details).

  4. Install bigcode-evaluation-harness as a pypi package:

    cd TuRTLe/bigcode-evaluation-harness/​
    pip install -e .
  5. Intall EDA Tools (not required for single line completion benchmarks)

    To install OpenLane, follow the instructions provided in the OpenLane Installation Guide.

    To install ICARUS Verilog on Windows check the Icarus Verilog Windows download page. To install it on Linux execute:

    sudo apt-get update
    sudo apt-get install iverilog

Finally, we recommend using Singularity for containerization on HPC environments. TuRTLe can dynamically create and submit Slurm job script. To enable this, include the following settings in your benchmark configuration file:

  • singularity_image: path to your singularity image.
  • For each model, specify a slurm_config from turtle/configs/slurm.yml with the slurm directives to run the benchmark.

Running the Project

To execute the project, use the turtle/run.py script with the appropriate arguments. Below are the details of the available parameters:

python turtle/run.py [--benchmark <config_file>] [--model <model_name>] [--run_all]

If the configuration file includes both singularity_image and slurm_config, TuRTLe will automatically generate and execute a Slurm script to run the benchmark using the specified Singularity image.

Core Parameters

  • --benchmark: Name of the .yml file in turtle/configs/ with the configurations of the benchmark to run (e.g., rtlrepo, rtllm, verilog_eval_cc, verilog_eval_rtl, verigen).
  • --model: Specify a particular model to run. If not provided, all models in the configuration file will be executed.
  • --run_all: Use this flag to run all benchmarks against all models.

Additional Parameters

Due to the dual-image setup, one for inference and another including EDA tools (e.g., Icarus Verilog, Verilator, Yosys, OpenLane), you can control each phase of the pipeline separately:

  • --generation_only: Use this flag to only perform inference.
  • --evaluation_only: Use this flag to only perform evaluation. We load the generations automatically from the YAML metric_output_path variable

Examples

  1. Run all models specified in the configuration file for the RTL-Repo benchmark:

    python turtle/run.py --benchmark rtlrepo 
  2. Test Qwen2.5-32B against the benchmark VerilogEval Code Completion:

    python turtle/run.py --benchmark verilog_eval_cc --model Qwen2.5-32B
  3. Run all benchmarks against all models:

    python turtle/run.py --run_all

Add your benchmark

The process to implement a benchmark is very similar to the one described by bigcode-evaluation-harness guide. Follow these steps:

  1. Copy the turtle/tasks/template/new_task.py into turtle/tasks/ and rename it to the name of your benchmark <benchmark_name>.py.
  2. Complete all the TODO comments in the template file.
  3. Define a configuration file named turtle/configs/<benchmark_name>.yml and list the models you want to evaluate along with their required parameters.
  4. Update the _load_new_modules() and _create_extended_registry() methods within turtle/src/utils/task_updater.py.

Citation

@inproceedings{garciagasulla2025turtleunifiedevaluationllms,
      title={TuRTLe: A Unified Evaluation of LLMs for RTL Generation}, 
      author={Dario Garcia-Gasulla and Gokcen Kestor and Emanuele Parisi and Miquel Albert\'i-Binimelis and Cristian Gutierrez and Razine Moundir Ghorab and Orlando Montenegro and Bernat Homs and Miquel Moreto},
      booktitle = {Proceedings of the 2025 ACM/IEEE International Symposium on Machine Learning for CAD},
      series = {MLCAD '25}
      year={2025},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      location = {Santa Cruz, CA, USA},
      url={https://arxiv.org/abs/2504.01986}, 
}

How to contribute 🤝

Any contribution is more than welcome! If you've found a bug or have an idea for an improvement, don't hesitate to open a new issue using our issue forms. We also encourage people to do pull requests with new benchmarks of any task relevant for chip design.

Contact

If you have any questions or feedback, feel free to email us at [email protected]. You can also support the project by following or starring the repository.


Made with ❤️ by HPAI at the Barcelona Supercomputing Center (BSC)

About

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •