Skip to content

The fine-tuned Qwen3-32B model does not follow the intended style. #1718

@alkaideemo

Description

@alkaideemo

Description

Used model name: Qwen3-32B
Fine-tuning data size: 8k+
Qwen3-32B did not adopt the desired style after being fine-tuned on private data. The think content and response are very similar to the original model.

Reproduction

We fine-tuned the open-source Qwen-32B model using approximately 8,000 QA samples. The sample format is as follows (chatml format):

<|im_start|>system
You are a help assistant. ......<|im_end|>
<|im_start|>user
user_question<|im_end|>
<|im_start|>assistant
</think>
think_content
</think>
response_content
<|im_end|>

Our fine-tuning data contains very strong stylistic keywords in think content. Through keyword detection, the style adherence rate of the model trained in the chatml format on the test set is only 1%.
However, without modifying the data content and only changing the training template to a deepseek format as shown below:

<|begin▁of▁sentence|><|System|>You are a help assistant. ......<|User|>
user_question<|Assistant|><think>
think_content
</think>
response_content<|end▁of▁sentence|>

the style adherence rate on the test set increased to 100%.
The test set consists of 200 samples, which should be fairly representative. We would like to ask whether the Qwen-32B model in thinking mode is unsuitable for stylized fine-tuning?

Logs

Environment Information

not needed.

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions