LMRSD (Learning meaningful rewards on Scientific Documents)

Core motivation

Using Language models in scholarly peer review seems comes with significant risks surrounding safety, research integrity and validity of the review.
Inevitably people utilize LLMs as pre-review agents if not fully autonomous peer-review agents.
Lack of a systematic evaluation of LLMs generating reviews across science disciplines misses the mark on and assessing the alignment/misalignment question.

Problem formulation

Given a Paper P, field F, and peer-review R, a traditional learning framework would capture the decision function θ(R^ | P, F) through a training objective minimizing R by Mean Absolute Error.
- Assumption 1: Representations from different pre-trained models capturing crucial information of P and F act as features to train the model θ.
- Assumption 2: Peer-review R includes both a sequence of tokens Rtext [r1, r2, r3, ….rn] and a discrete value Rscore representing the score gauging the evaluation of the idea/manuscript on a scale of 1-10.
Utilizing large language models(LLMs) can provide a training-free framework to understand peer-review Rscore and assess the alignment/mis-alignment of LLMs over the real-world outcome such as hit-paper status in field F.
Systematically assessing alignment of LLMs Rscore would help us gauge the safety risks involved with deploying large language models as agents for pre-review to help reviewers with peer-review.

Research Agenda

RQ-1: Understanding the joint distribution of idea review scores and paper review scores for a collection of language models.
RQ-2: Apart from the accuracy, observe the alignment and misalignment of each model to observe which agrees/disagrees the most with the human label.
RQ-3: Assessing reviews where humans/LLMs can gauge hit-paper 1%, 5%, and 10% outcomes.
Ablation-1: Observing the effect of stochasticity in generating the reviews for LLMs.
Ablation-2: Observing the effect of prompt instructions over idea/paper review scores.
Ablation-3: Capturing memorization/generalization to probe pretrained knowledge of dataset.

Data

More about the data can be found here.

NOTE: The datasets are available as parquet files on Google drive, and they can be found here.

Repository structure

├── LICENSE
├── README.md
├── data
│   ├── README.md
│   ├── __init__.py
│   └── media
│       ├── review_idea_distribution.png
│       ├── review_joint_distribution.png
│       └── review_paper_distribution.png
└── src
    ├── __init__.py
    ├── icl.py
    ├── prompts.py
    └── schema.py

Environment setup

TBA

Acknowledgement

Thanks to @sumuks and the huggingface repo sumuks/openreview-reviews-filtered which were crucial for the dataset, experiments, and meethodology of the paper.

License

MIT License

Authors and Collaborators

Akhil Pandey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LMRSD (Learning meaningful rewards on Scientific Documents)

Core motivation

Problem formulation

Research Agenda

Data

Repository structure

Environment setup

Acknowledgement

License

Authors and Collaborators

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

akhilpandey95/LMRSD

Folders and files

Latest commit

History

Repository files navigation

LMRSD (Learning meaningful rewards on Scientific Documents)

Core motivation

Problem formulation

Research Agenda

Data

Repository structure

Environment setup

Acknowledgement

License

Authors and Collaborators

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages