提示词模板不一致问题

官方在readme上给出的提示词模板，和调用openai api、web ui得到的结果存在不一致性：

readme上给出的提示词模板：https://github.com/THUDM/ChatGLM3/blob/main/PROMPT.md#%E6%95%B4%E4%BD%93%E7%BB%93%E6%9E%84

调用openai api、web ui时，历史对话和当前用户的输入通过'build_chat_input'方法来编码：
对于下述对话：
User -> 你好
Chatglm3 -> 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。
User -> 你是谁

input_ids为：
[64790, 64792, 64794, 30910,    13,   809,   383, 22011, 10461, 30944,
         30966, 30932,   260,  1796,  3239,  2092,  7594,   422,  1192,   899,
         30923, 30930, 23833, 30930,  5741,   267,  2795, 30953, 30917,  8417,
          7724, 30930, 21911,  1227,  3478,  3536, 30930, 64795, 30910,    13,
         36474, 54591, 64796, 30910,    13, 36474, 54591,   243,   162,   148,
           142, 31404, 33030, 30942,  1960, 10461, 30944, 30966, 31123, 48895,
         35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 64795, 30910,
            13, 30910, 34607, 55622, 64796]

如果用tokenizer进行解码，解码结果为：
[gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 你好<|assistant|> \n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|> \n 你是谁<|assistant|>

readme上的提示词模板是不论special tokens，还是text都会跟着一个'\n'，但是代码实现上却没有。同时，对于special tokens和真正对话之间的seperator，也进行了单独的encode过程。

请问到底哪个输入构建的方式是合理的？

麻烦大佬指点@duzx16 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

提示词模板不一致问题 #127

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

提示词模板不一致问题 #127

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions