Skip to content
Discussion options

You must be logged in to vote

This is totally dependent on the API provider. Compare the "smoothness" of the stream generation with OpenAI. It's mostly due to rate of token per generation response. I believe providers do a bit of "stuffing" each response with more tokens to reduce the number of responses being transmitted. OpenAI's rate is 1 token per response, which provides a good streaming experience.

The only way around this is to manipulate the stream rate as well as control the length of "chunks" (total tokens per response). Despite being fast, Google is the worst offender of choppy text so I explicitly coded in something to manipulate the stream, so that it's smoother when running on LibreChat.

LiteLLM is an ad…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@rcdailey
Comment options

Answer selected by rcdailey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants