Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      230152Updated Sep 5, 2025Sep 5, 2025
    • mSTEB

      Public
      Jupyter Notebook
      0000Updated Aug 25, 2025Aug 25, 2025
    • ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
      Python
      4114500Updated Aug 18, 2025Aug 18, 2025
    • TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models
      Python
      21800Updated Aug 17, 2025Aug 17, 2025
    • AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
      Python
      13700Updated Aug 7, 2025Aug 7, 2025
    • R
      0100Updated Jul 30, 2025Jul 30, 2025
    • AfroBench

      Public
      Large Scale Benchmark of Large Language Models on African Languages
      Python
      2900Updated Jul 28, 2025Jul 28, 2025
    • Python
      0520Updated Jul 15, 2025Jul 15, 2025
    • Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
      Jupyter Notebook
      4752341Updated Jul 7, 2025Jul 7, 2025
    • AURORA

      Public
      Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
      Python
      23000Updated Jun 30, 2025Jun 30, 2025
    • Python
      0000Updated Jun 22, 2025Jun 22, 2025
    • VinePPO

      Public
      Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
      Python
      2016940Updated May 25, 2025May 25, 2025
    • Evaluation dataset for our NAACL 2025 paper on "Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs"
      0000Updated May 14, 2025May 14, 2025
    • 21000Updated May 12, 2025May 12, 2025
    • Repo for "Language Models Largely Exhibit Human-like Constituent Ordering Preferences"
      Python
      1200Updated Apr 26, 2025Apr 26, 2025
    • Python
      3101Updated Apr 25, 2025Apr 25, 2025
    • safearena

      Public
      SafeArena is a benchmark for assessing the harmful capabilities of web agents
      Python
      31720Updated Apr 23, 2025Apr 23, 2025
    • Code for `Exploiting Instruction-Following Retrievers for Malicious Information Retrieval`
      Python
      1600Updated Apr 1, 2025Apr 1, 2025
    • project-page-template

      Public template
      Template for creating project webpages based on jekyll/minimal-mistakes
      1100Updated Mar 13, 2025Mar 13, 2025
    • Python
      47000Updated Mar 11, 2025Mar 11, 2025
    • CHASE

      Public
      Synthetic Data Generation for Evaluation
      Python
      41400Updated Feb 21, 2025Feb 21, 2025
    • Injongo

      Public
      A multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains.
      Jupyter Notebook
      0000Updated Feb 12, 2025Feb 12, 2025
    • weblinx

      Public
      WebLINX is a benchmark for building web navigation agents with conversational capabilities
      Python
      1715700Updated Feb 11, 2025Feb 11, 2025
    • llm2vec

      Public
      Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
      Python
      1281.6k364Updated Jan 24, 2025Jan 24, 2025
    • webllama

      Public
      Llama-3 agents that can browse the web by following instructions and talking to you
      Python
      1081.4k20Updated Dec 10, 2024Dec 10, 2024
    • The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
      Python
      2900Updated Nov 6, 2024Nov 6, 2024
    • NAACL 2024: Evaluating In-Context Learning of Libraries for Code Generation
      Python
      2900Updated Oct 23, 2024Oct 23, 2024
    • Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
      Python
      58651Updated Aug 12, 2024Aug 12, 2024
    • Code and data for the paper 'Scope Ambiguities in Large Language Models'.
      Python
      2500Updated Jun 25, 2024Jun 25, 2024
    • Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
      Python
      713730Updated Apr 30, 2024Apr 30, 2024