NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 193
Star 1.5k

Code
Issues 60
Pull requests 48
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Pull requests: NVIDIA/TensorRT-Model-Optimizer

Labels 23 Milestones 0

New pull request New

48 Open 248 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix hf_quant_config with kv cache type

#557 opened Nov 14, 2025 by jenchen13

Loading…

Support Wan2.2 t2v diffusers quantization

#556 opened Nov 13, 2025 by shengliangxu

Loading…

GPTQ Lite implementation

#555 opened Nov 13, 2025 by sugunav14 • Draft

[5455919] Fix Q/DQ/Cast placement in 'FP32 required' custom ops

#554 opened Nov 13, 2025 by gcunhase

Loading…

Fix QLoRA example test

#553 opened Nov 13, 2025 by sugunav14

Loading…

llama converter is self-contained now (no dependency on internal nvidia code)

#552 opened Nov 13, 2025 by danielkorzekwa

Loading…

AutoQuantize minor improvement: limit grad enabled parameters, limit cpu-gpu sync during scoring

#551 opened Nov 13, 2025 by realAsma

Loading…

Make all tensors on same device for svdquant with cpu-offloading

#550 opened Nov 13, 2025 by vishalpandya1990

Loading…

Feat: Eagle3 HF Online - support nemotron models

#548 opened Nov 13, 2025 by h-guo18

Loading…

[5336870] AutoCast: Unblock LSTM from conversion

#544 opened Nov 12, 2025 by galagam

Loading…

[5643020] AutoCast use onnxscript for opset conversion

#542 opened Nov 12, 2025 by galagam

Loading…

Specdec Bench: vLLM reqid, SGL path, conc > 1 metric fix

#541 opened Nov 12, 2025 by IzzyPutterman

Loading…

[OMNIML-3015]Add per tensor/per channel MSE calibrator

#540 opened Nov 12, 2025 by Fridah-nv

Loading…

2 tasks

[OMNIML-2850] [3/n] Adds sparse attention calibration

#538 opened Nov 11, 2025 by kaix-nv

Loading…

Use ONNX DQ node instead of DQ custom-op for activation dequantization in nvfp4

#536 opened Nov 11, 2025 by vishalpandya1990

Loading…

Optimize NVFP4 Triton kernel

#533 opened Nov 11, 2025 by mxinO • Draft

[OMNIML-2852] [2/n] Add Core Sparse Attention Infrastructure

#527 opened Nov 7, 2025 by kaix-nv

Loading…

parallel eagle draft

#523 opened Nov 6, 2025 by yeyu-nvidia • Draft

[Bug #193] fix fp8 blockwise real quantization

#522 opened Nov 6, 2025 by meenchen

Loading…

Support AWQ fake quant for vLLM MoE models

#521 opened Nov 6, 2025 by meenchen • Draft

Fix BMM style MoE export in fp8_pc_pt recipe

#515 opened Nov 5, 2025 by Edwardf0t1

Loading…

[Draft] [5526696] Add kv cache quantization support for onnx quantization

#486 opened Oct 31, 2025 by zhanghaoc

Loading…

Yeyu/set block

#480 opened Oct 28, 2025 by yeyu-nvidia • Draft

feat: add onnxslim support

#478 opened Oct 28, 2025 by inisis

Loading…

Feat: Eagle3 HF Online - support nemotron models

#463 opened Oct 25, 2025 by h-guo18

Loading…

Previous 1 2 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!