Skip to content

Conversation

@doquangg
Copy link

@doquangg doquangg commented Apr 1, 2025

I tried to run main.py, targeting my local Llama2 models from HuggingFace. However, I wasn't able to do so due to incompatibilities with the previous implementation, specifically:


line 62, in _forward_with_kvcache
self._past_key_values = self._past_key_values + (outputs.past_key_values,)
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'DynamicCache' and 'tuple'

I've implemented DynamicCache to get this to work with Llama 2 while maintaining the bloom functionality. Additionally, I've added corrected timings to the main.py script, so the end user can have quantitative measurements about the time decrease created by speculative decoding.

Please let me know if you have any questions.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant