prompt opt without training

Hi! If I get ART right, llm outputs evaluation for multi agents trajectories. And reward values come from these outputs.
So, I just want to suggest a paper i saw recently which is related to this issue. 

```python
"Black-Box Prompt Optimization: Aligning Large Language Models without Model Training"
```

This paper's method does not require training process and achieves further optimization.

```python
"Black-Box Prompt Optimization" Summary

This method improves prompts without any model training, using only interactions with an LLM:

Given a user prompt, the LLM generates two responses, and the user selects the better one.

Ask the LLM to explain why the worse answer is bad, then rewrite the prompt to fix the issue.

Collect (original_prompt, optimized_prompt) pairs to train a prompt preference optimizer.

🔧 Loss: Maximize the log-probability of generating the optimized prompt tokens.

→ Enables alignment without fine-tuning, aligned in spirit with ART's RULER approach.
```
This method can generate more precise reward score in my sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prompt opt without training #284

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prompt opt without training #284

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions