Update TensorRT-LLM #846

kaiyux · 2024-01-09T12:09:37Z

Model Support
- Add example for multimodal models (BLIP with OPT or T5, LlaVA)
Features
- Smooth Quantization support for ChatGLM2-6B / ChatGLM3-6B / ChatGLM2-6B-32K
- Out-of-the-box support for the QWEN model
- Support for returning context and/or generation logits in the Triton backend
API
- Add a set of High-level APIs for end-to-end generation tasks, the features are as below
  - ModelConfig() as a clean configuration interface for LLM tasks
  - LLM() for LLM pipelines, it will trigger the necessary engine building or model quantization silently in the background
  - generate() API for batched offline inference, both single-GPU and multi-GPU supported
  - generate_async() API for asynchronous offline inference on a single GPU, streaming mode is supported
Bug fixes
- Add pickle support for InferenceRequest GptManager pybind 2/4TP run demo #701
- Fix Mixtral-8x7b build failure with custom_all_reduce Mixtral-8x7b build fails with custom_all_reduce #825
Performance
- Performance optimization of beam search kernel
- Increase default freeGpuMemoryFraction parameter from 0.85 to 0.9 for higher throughput
Documentation
- Add documentation for best practices for tuning the performance of TensorRT-LLM (See docs/source/perf_best_practices.md)
- Add documentation for Falcon AWQ support (See examples/falcon/README.md)

* Update TensorRT-LLM --------- Co-authored-by: Shixiaowei02 <[email protected]>

Update TensorRT-LLM

77d24b5

kaiyux marked this pull request as draft January 9, 2024 12:09

update

d043415

Shixiaowei02 marked this pull request as ready for review January 9, 2024 13:00

Shixiaowei02 approved these changes Jan 9, 2024

View reviewed changes

kaiyux merged commit d879430 into main Jan 9, 2024

kaiyux deleted the kaiyu/update branch January 9, 2024 13:03

xesdiny mentioned this pull request Jan 19, 2024

ModelRunnerCpp.generate throw tensorrt_llm::common::TllmException for the second time #912

Closed

4 tasks

wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025

Update TensorRT-LLM (NVIDIA#846)

86d22a1

* Update TensorRT-LLM --------- Co-authored-by: Shixiaowei02 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update TensorRT-LLM #846

Update TensorRT-LLM #846

Uh oh!

kaiyux commented Jan 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update TensorRT-LLM #846

Update TensorRT-LLM #846

Uh oh!

Conversation

kaiyux commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaiyux commented Jan 9, 2024 •

edited

Loading