Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Introduction

LatentSeek is a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages policy gradient to iteratively update latent representations, guided by self-generated reward signals.

Installation

conda create -n latentseek python=3.10
conda activate latentseek
pip3 install torch torchvision torchaudio
pip install transformers datasets tqdm accelerate
pip install termcolor

# for evaluation
cd src/extract_judge_answer/latex2sympy
pip install -e .
pip install math-verify
pip install word2number

Usage

cd src
cd scripts
vim example.sh # modify this according to your need
cd ..
sh scripts/example.sh

The example.sh file

PATH_TO_DATA= # path to the dataset (the path str should contain either "AIME_2024", "gsm8k", "MATH-500")
PATH_TO_MODEL= # path to the model 
rho= # the value of rho, which is the hyperparameter for the fractional update
lr= # the learning rate
solver_prompt_idx= # the index of the solver prompt to use (0 for "boxex", 1 for "json")

python main.py \
    --dataset $PATH_TO_DATA \
    --model_name_or_path $PATH_TO_MODEL \
    --output_dir ./output \
    --k $rho \
    --lr $lr \
    --solver_prompt_idx $solver_prompt_idx \
    --device "cuda" \

Files for Modification

Main logic file: main
Opt generation file (LatentSeek core): opt
CoT generation file (original generation): ori
Data: data
Reward Model: reward
Self-Reward Prompts: self-reward prompts
CoT Prompts: CoT prompts

Citation

@misc{li2025seekdarkreasoningtesttime,
      title={Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space}, 
      author={Hengli Li and Chenxi Li and Tong Wu and Xuekai Zhu and Yuxuan Wang and Zhaoxin Yu and Eric Hanchen Jiang and Song-Chun Zhu and Zixia Jia and Ying Nian Wu and Zilong Zheng},
      year={2025},
      eprint={2505.13308},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.13308}, 
}

Contact

If you have any questions, please send me an email at: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Introduction

Installation

Usage

Files for Modification

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

bigai-nlco/LatentSeek

Folders and files

Latest commit

History

Repository files navigation

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Introduction

Installation

Usage

Files for Modification

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages