Skip to content
Change the repository type filter

All

    Repositories list

    • Kratos-benchmark

      Public
      Kratos: An FPGA Benchmark for Unrolled Deep Neural Networks with Fine-Grained Sparsity and Mixed Precision
      Python
      21210Updated Oct 30, 2025Oct 30, 2025
    • LMCache

      Public
      Supercharge Your LLM with the Fastest KV Cache Layer
      Python
      684000Updated Oct 28, 2025Oct 28, 2025
    • abdelfattah-lab.github.io

      Public
      SCSS
      19110Updated Oct 21, 2025Oct 21, 2025
    • fiddler

      Public
      [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
      Python
      24000Updated Oct 20, 2025Oct 20, 2025
    • vtr-updated

      Public
      Update VTR to latest
      C++
      430000Updated Oct 6, 2025Oct 6, 2025
    • xKV

      Public
      xKV: Cross-Layer SVD for KV-Cache Compression
      Python
      44222Updated Sep 21, 2025Sep 21, 2025
    • razer-llm

      Public
      Python
      0100Updated Aug 13, 2025Aug 13, 2025
    • TokenButler

      Public
      Python
      32610Updated Jul 29, 2025Jul 29, 2025
    • BitMoD-HPCA-25

      Public
      Python
      105100Updated Jul 19, 2025Jul 19, 2025
    • SplitReason

      Public
      Python
      21910Updated Jul 1, 2025Jul 1, 2025
    • RaZeR

      Public
      Python
      1400Updated May 7, 2025May 7, 2025
    • Modified Kratos benchmark for architectural exploration
      Python
      2100Updated Mar 27, 2025Mar 27, 2025
    • COFFE

      Public
      Forked to make local changes
      Verilog
      28000Updated Mar 24, 2025Mar 24, 2025
    • nitro

      Public
      Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs
      Python
      02320Updated Dec 17, 2024Dec 17, 2024
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      11k000Updated Dec 3, 2024Dec 3, 2024
    • attamba

      Public
      Python
      11400Updated Nov 29, 2024Nov 29, 2024
    • M4BRAM

      Public
      Python
      1800Updated Nov 27, 2024Nov 27, 2024
    • BRAMAC

      Public
      Python
      1910Updated Nov 27, 2024Nov 27, 2024
    • Python
      11000Updated Sep 20, 2024Sep 20, 2024
    • Python
      21100Updated Jun 28, 2024Jun 28, 2024
    • flan_nas

      Public
      Python
      2900Updated May 6, 2024May 6, 2024
    • PQA

      Public
      Python
      0500Updated Mar 26, 2024Mar 26, 2024
    • Python
      0100Updated Mar 22, 2024Mar 22, 2024
    • Verilog to Routing -- Open Source CAD Flow for FPGA Research
      C++
      430200Updated Oct 13, 2023Oct 13, 2023
    • diviml

      Public
      0200Updated Jul 31, 2023Jul 31, 2023
    • The Triton Inference Server provides an optimized cloud and edge inferencing solution.
      Python
      1.7k000Updated Feb 21, 2023Feb 21, 2023