Skip to content

Commit 75c761a

Browse files
docs: Show how to start RL from an existing SFT LoRA adapter (#325)
1 parent 30481e2 commit 75c761a

File tree

2 files changed

+39
-0
lines changed

2 files changed

+39
-0
lines changed

docs/fundamentals/art-client.mdx

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,29 @@ backend = SkyPilotBackend.initialize_cluster(
6666
await model.register(backend)
6767
```
6868

69+
### Initializing from an existing SFT LoRA
70+
71+
If you've already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path as `base_model` when creating your `TrainableModel`.
72+
73+
Why this?
74+
75+
- Warm-start from task-aligned weights to reduce steps/GPU cost.
76+
- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.
77+
78+
```python
79+
import art
80+
81+
model = art.TrainableModel(
82+
name="agent-001",
83+
project="my-agentic-task",
84+
# Point to the local SFT LoRA adapter directory
85+
# (e.g., contains adapter_config.json and adapter_model.bin/safetensors)
86+
base_model="/path/to/my_sft_lora_adapter",
87+
)
88+
```
89+
90+
ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
91+
6992
You're now ready to start training your agent.
7093

7194
## Running inference

docs/getting-started/faq.mdx

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,22 @@ By allowing an LLM to make multiple attempts at accomplishing a task and scoring
1414

1515
</Accordion>
1616

17+
<Accordion title="Can I start RL from an existing SFT LoRA adapter?">
18+
Yes. If you have a standard Hugging Face–style LoRA adapter directory (e.g., produced by Unsloth/PEFT), pass the adapter folder path as the `base_model` when creating your `TrainableModel`.
19+
20+
```python
21+
import art
22+
23+
model = art.TrainableModel(
24+
name="agent-001",
25+
project="my-agentic-task",
26+
base_model="/path/to/my_sft_lora_adapter", # HF-style adapter dir
27+
)
28+
```
29+
30+
ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
31+
</Accordion>
32+
1733
<Accordion title="How does ART work under the hood?">
1834
This flow chart shows a highly simplified flow of how ART optimizes your agent. Your code is responsible for actually running the agent in the environment it will operate in, as well as scoring the trajectory (deciding whether the agent did a good job or not). ART is then able to take those trajectories and scores and use them to iteratively train your agent and improve performance.
1935

0 commit comments

Comments
 (0)