-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Fix eagle radix cache #10846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix eagle radix cache #10846
Conversation
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
The page_size>1 still have some issue, will fix it later. |
@xiezhq-hermann @merrymercy This PR is ready for review. |
if value is None: | ||
value = torch.tensor(key.token_ids, dtype=torch.int64) | ||
|
||
if self.is_eagle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hiradix (and other trees like swa) override the insert
function, would that be a problem since eagle worker shared the same tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in current design, we need to adapt this change to other trees like swa and hiradix if they override these functions. This PR just makes the main radix tree ready. HiCache and swa need extra work and test to make them ready.
|
||
return self._insert_helper(self.root_node, key, value) | ||
|
||
def cache_finished_req(self, req: Req): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while hiradix
does not, swa tree override this implementation as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The swa cache inherits from BaseRadixCache, so it seems all the changes should be implemented again on it. HiCache is from RadixCache, we just need to do some adaptation on it with less override. But for HiCache, the main thing I'm concerning is that the chunked prefill size is a little changed. If the chunked prefill size is 64, actually only 63 bigram keys are inserted to the tree. Maybe it's not efficient for cache offloading with block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is what we are doing is primarily to resolve conflict with eagle workers since it shares the same radix tree but has its own pool, but not to have hicache support for eagle workers, i.e., eagle workers to fetch kv caches from host memory, which seems unnecessary and potentially complicated. Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the kv cache for eagle worker is unnecessary to store into host memory since it's only one layer. If we use HiCache only for target model, can we still share the kv indices between target and draft pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think it should be fine just wanted to confirm that we are aligned on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @merrymercy
Motivation
main:
this PR:
compatibility test: