Skip to content

prompt opt without training #284

@genji970

Description

@genji970

Hi! If I get ART right, llm outputs evaluation for multi agents trajectories. And reward values come from these outputs.
So, I just want to suggest a paper i saw recently which is related to this issue.

"Black-Box Prompt Optimization: Aligning Large Language Models without Model Training"

This paper's method does not require training process and achieves further optimization.

"Black-Box Prompt Optimization" Summary

This method improves prompts without any model training, using only interactions with an LLM:

Given a user prompt, the LLM generates two responses, and the user selects the better one.

Ask the LLM to explain why the worse answer is bad, then rewrite the prompt to fix the issue.

Collect (original_prompt, optimized_prompt) pairs to train a prompt preference optimizer.

🔧 Loss: Maximize the log-probability of generating the optimized prompt tokens.

→ Enables alignment without fine-tuning, aligned in spirit with ART's RULER approach.

This method can generate more precise reward score in my sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions