Choppy text response streaming #5124
-
When streaming responses back from claude-3.5, responses are a little choppy/slow. Is there a way to configure this for improved streaming performance of the text? Currently I use LiteLLM proxy to facilitate access to anthropic and open AI models. I do not directly integrate them through librechat itself. Advice is appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
This is totally dependent on the API provider. Compare the "smoothness" of the stream generation with OpenAI. It's mostly due to rate of token per generation response. I believe providers do a bit of "stuffing" each response with more tokens to reduce the number of responses being transmitted. OpenAI's rate is 1 token per response, which provides a good streaming experience. The only way around this is to manipulate the stream rate as well as control the length of "chunks" (total tokens per response). Despite being fast, Google is the worst offender of choppy text so I explicitly coded in something to manipulate the stream, so that it's smoother when running on LibreChat. LiteLLM is an additional layer for you--I couldn't tell you whether they manipulate the rate of tokens per generation response, but they technically could. They probably leave it alone when proxying. Through agents, I've been experimenting with making the stream smoother for Anthropic, and you can see it's possible here: |
Beta Was this translation helpful? Give feedback.
This is totally dependent on the API provider. Compare the "smoothness" of the stream generation with OpenAI. It's mostly due to rate of token per generation response. I believe providers do a bit of "stuffing" each response with more tokens to reduce the number of responses being transmitted. OpenAI's rate is 1 token per response, which provides a good streaming experience.
The only way around this is to manipulate the stream rate as well as control the length of "chunks" (total tokens per response). Despite being fast, Google is the worst offender of choppy text so I explicitly coded in something to manipulate the stream, so that it's smoother when running on LibreChat.
LiteLLM is an ad…