-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description
Used model name: Qwen3-32B
Fine-tuning data size: 8k+
Qwen3-32B did not adopt the desired style after being fine-tuned on private data. The think content and response are very similar to the original model.
Reproduction
We fine-tuned the open-source Qwen-32B model using approximately 8,000 QA samples. The sample format is as follows (chatml format):
<|im_start|>system
You are a help assistant. ......<|im_end|>
<|im_start|>user
user_question<|im_end|>
<|im_start|>assistant
</think>
think_content
</think>
response_content
<|im_end|>
Our fine-tuning data contains very strong stylistic keywords in think content. Through keyword detection, the style adherence rate of the model trained in the chatml format on the test set is only 1%.
However, without modifying the data content and only changing the training template to a deepseek format as shown below:
<|begin▁of▁sentence|><|System|>You are a help assistant. ......<|User|>
user_question<|Assistant|><think>
think_content
</think>
response_content<|end▁of▁sentence|>
the style adherence rate on the test set increased to 100%.
The test set consists of 200 samples, which should be fairly representative. We would like to ask whether the Qwen-32B model in thinking mode is unsuitable for stylized fine-tuning?
Logs
Environment Information
not needed.
Known Issue
- The issue hasn't been already addressed in Documentation, Issues, and Discussions.