Skip to content

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Apr 30, 2024

  • Model Support
    • [Experimental] Support RecurrentGemma
  • Features
    • Support paged KV cache for enc-dec models, note that the support is limited to beam width 1
  • Bug fixes
  • Benchmark
    • [BREAKING CHANGE] Move request rate generation arguments and logic from prepare dataset script to gptManagerBenchmark
  • Performance
    • Improve the performance of pipeline parallelism when enabling in-flight batching

@kaiyux kaiyux merged commit 06c0e9b into main Apr 30, 2024
@kaiyux kaiyux deleted the kaiyu/update branch April 30, 2024 09:19
wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants