🤗 Preference Dataset | 📚 Documentation | 📄 Paper
This repository is the source code for the ACL 2025 paper, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback, where we introduce a routing framework that creates hybrid preferences with both LLM and human preference annotations to maximize performance on a given evaluation metric (e.g., RewardBench). We release this codebase to improve reproducibility of our work, and to aid researchers in constructing preference datasets in their research.
 
Install the dependencies within your Python environment:
python -m venv venv
venv/bin/source activate
pip install -r requirements.txtRunning the full pipeline involves several steps, some might need to be run on a TPU machine. Nevertheless, we wrote scripts to automate different parts of the pipeline. Please head over the docs directory for more information.
@inproceedings{miranda-etal-2025-hybrid,
    title = "Hybrid Preferences: Learning to Route Instances for Human vs. {AI} Feedback",
    author = "Miranda, Lester James Validad  and
      Wang, Yizhong  and
      Elazar, Yanai  and
      Kumar, Sachin  and
      Pyatkin, Valentina  and
      Brahman, Faeze  and
      Smith, Noah A.  and
      Hajishirzi, Hannaneh  and
      Dasigi, Pradeep",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.355/",
    pages = "7162--7200",
    ISBN = "979-8-89176-251-0",
    abstract = "Learning from human feedback has enabled the alignment of language models (LMs) with human preferences. However, collecting human preferences is expensive and time-consuming, with highly variable annotation quality. An appealing alternative is to distill preferences from LMs as a source of synthetic annotations, offering a cost-effective and scalable alternative, albeit susceptible to other biases and errors. In this work, we introduce HyPER, a Hybrid Preference routER that defers an annotation to either humans or LMs, achieving better annotation quality while reducing the cost of human-only annotation. We formulate this as an optimization problem: given a preference dataset and an evaluation metric, we (1) train a performance prediction model (PPM) to predict a reward model{'}s (RM) performance on an arbitrary combination of human and LM annotations and (2) employ a routing strategy that selects a combination that maximizes predicted performance. We train the PPM on MultiPref, a new preference dataset with 10K instances paired with human and LM labels. We show that the selected hybrid mixture of synthetic and direct human preferences using HyPER achieves better RM performance compared to using either one exclusively by 7-13{\%} on RewardBench and generalizes across unseen preference datasets and other base models. We also observe the same trend in other benchmarks using Best-of-N reranking, where the hybrid mix has 2-3{\%} better performance. Finally, we analyze features from HyPER and find that prompts with moderate safety concerns or complexity benefit the most from human feedback."
}