docs: Show how to start RL from an existing SFT LoRA adapter (#325)

philippnormann · web-flow · commit 75c761a93abf · 2025-08-11T00:28:01.000-07:00
diff --git a/docs/fundamentals/art-client.mdx b/docs/fundamentals/art-client.mdx
@@ -66,6 +66,29 @@ backend = SkyPilotBackend.initialize_cluster(
 await model.register(backend)
 ```
 
+### Initializing from an existing SFT LoRA
+
+If you've already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path as `base_model` when creating your `TrainableModel`.
+
+Why this?
+
+- Warm-start from task-aligned weights to reduce steps/GPU cost.
+- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.
+
+```python
+import art
+
+model = art.TrainableModel(
+    name="agent-001",
+    project="my-agentic-task",
+    # Point to the local SFT LoRA adapter directory
+    # (e.g., contains adapter_config.json and adapter_model.bin/safetensors)
+    base_model="/path/to/my_sft_lora_adapter",
+)
+```
+
+ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
+
 You're now ready to start training your agent.
 
 ## Running inference
diff --git a/docs/getting-started/faq.mdx b/docs/getting-started/faq.mdx
@@ -14,6 +14,22 @@ By allowing an LLM to make multiple attempts at accomplishing a task and scoring
 
 </Accordion>
 
+<Accordion title="Can I start RL from an existing SFT LoRA adapter?">
+  Yes. If you have a standard Hugging Face–style LoRA adapter directory (e.g., produced by Unsloth/PEFT), pass the adapter folder path as the `base_model` when creating your `TrainableModel`.
+
+```python
+import art
+
+model = art.TrainableModel(
+    name="agent-001",
+    project="my-agentic-task",
+    base_model="/path/to/my_sft_lora_adapter",  # HF-style adapter dir
+)
+```
+
+ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
+</Accordion>
+
 <Accordion title="How does ART work under the hood?">
   This flow chart shows a highly simplified flow of how ART optimizes your agent. Your code is responsible for actually running the agent in the environment it will operate in, as well as scoring the trajectory (deciding whether the agent did a good job or not). ART is then able to take those trajectories and scores and use them to iteratively train your agent and improve performance.