Model Offloading Support Pt 2 #34

Satrat · 2024-07-23T03:23:43Z

SUMMARY:
Adding support for partially offloading models to cpu. Requires this PR to compressed-tensors as well: neuralmagic/compressed-tensors#113. A couple main changes here:

Similar to in the compressed-tensors PR, we need to be careful to update the offloaded state dict when changing any parameters
helper functions for determining memory allocations, taking quantization/GPTQ into account or providing custom requirements.
Removing the requirement that calibration data be passed for oneshot. Scale/zp for weights are set when calibration is initialized
Adding examples for offloading oneshot runs

TEST PLAN:
Only manual testing so far, need to add in unit and integration tests after this lands

Sara Adkins added 17 commits July 14, 2024 18:38

fp8 working

23fbab0

fix weight refs, gptq support

bc05a9d

sparsity percent fix

56ddc68

bug fixes

f5cacf8

fix gptq weight application

ff58d39

fix tests

526ee2b

don't place device

932ca94

cleanup

74a5de0

helper fns

8a209c5

fix ratio

927a730

accept no calibration data

7c58f9d

clean up helpers and cache

dbaf304

Merge branch 'sa/no_calib_data_ux' into sa/big_model_support

0de76dd

adding examples

355c6d5

update example

03cc287

minor fixes

17f826a

docstrings, update example

f7509bd

Satrat mentioned this pull request Jul 23, 2024

Model Offloading Support neuralmagic/compressed-tensors#113

Merged

Satrat requested review from mgoin, bfineran and robertgshaw2-redhat July 23, 2024 03:29

Sara Adkins added 2 commits July 23, 2024 14:25

less logging

9059545

memory requirements

2c0714d

Satrat changed the title ~~[Draft] Model Offloading Support~~ Model Offloading Support Pt 2 Jul 23, 2024

Satrat requested review from dsikka, rahul-tuli and horheynm July 23, 2024 16:03

This was referenced Jul 24, 2024

[UX] Allow quantization of weights without calibration data #28

Closed

Mixtral 8*22B Quantization Failed with 2 issues #35

Closed

Merge branch 'main' into sa/big_model_support

8a5008b

Sara Adkins added 2 commits July 25, 2024 16:51

stype

ed4ad0f

check calibration

5eddaa0

bfineran approved these changes Jul 30, 2024

View reviewed changes

Satrat merged commit 431b652 into main Jul 30, 2024
8 of 12 checks passed

Satrat mentioned this pull request Aug 6, 2024

[ UX ] Skip Running First Cycle Through Dataset for Weight-Only Quantization #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Offloading Support Pt 2 #34

Model Offloading Support Pt 2 #34

Uh oh!

Satrat commented Jul 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Model Offloading Support Pt 2 #34

Model Offloading Support Pt 2 #34

Uh oh!

Conversation

Satrat commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Satrat commented Jul 23, 2024 •

edited

Loading