Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
cd4e5e0
checkpoint compression init
wtmlon Sep 23, 2024
7684576
add ckpt quant argument
wtmlon Sep 24, 2024
afcecad
add ckpt quant ci
wtmlon Oct 11, 2024
d8f3351
fix ci
wtmlon Oct 11, 2024
434bd4c
fix lint
wtmlon Oct 11, 2024
a98fb8b
remove stage O2, change O3 --> O2
wtmlon Oct 11, 2024
2e5c73b
support async save
wtmlon Oct 11, 2024
6b1f3bf
file adjustment
wtmlon Oct 14, 2024
c4a80e7
magic string remove
wtmlon Oct 14, 2024
ae305a9
ci fix
wtmlon Oct 14, 2024
fd6ad57
ci fix, code refinement
wtmlon Oct 14, 2024
f766d15
function extraction
wtmlon Oct 15, 2024
e74b68b
fix ci
wtmlon Oct 15, 2024
a7b053d
code refinement
wtmlon Oct 15, 2024
10b1064
fix ci
wtmlon Oct 15, 2024
ad1dc75
fix ci
wtmlon Oct 15, 2024
fb2c2e9
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Oct 16, 2024
a1c35af
support non merge tp ckpt quantization
wtmlon Oct 18, 2024
f8530c0
fix ci
wtmlon Oct 18, 2024
4e21fb9
update
wtmlon Oct 18, 2024
a602fe5
fix bug
wtmlon Oct 21, 2024
55b8639
code refactor
wtmlon Oct 25, 2024
3a87734
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Oct 25, 2024
a3073aa
fix lint
wtmlon Oct 25, 2024
8a8aca7
fix ci
wtmlon Oct 25, 2024
bab5235
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Oct 28, 2024
c3c500d
del old uc.py
wtmlon Oct 28, 2024
a45c7f6
fix lint
wtmlon Oct 28, 2024
a4a3e23
add mgpu ci
wtmlon Oct 28, 2024
2330839
fix ci
wtmlon Oct 28, 2024
3fcd471
multi thread loading
wtmlon Oct 28, 2024
f57aab5
fix lint
wtmlon Oct 28, 2024
50ee148
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Oct 29, 2024
75a1011
fix bug
wtmlon Nov 5, 2024
ffd0823
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Nov 5, 2024
4947a8c
refactor code
wtmlon Nov 7, 2024
3eaebbb
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
wtmlon Nov 19, 2024
a6b2236
add comment
wtmlon Nov 19, 2024
a5d0afa
fix lint
wtmlon Nov 19, 2024
fdd92a8
add comment
wtmlon Nov 19, 2024
b2b20be
add comment
wtmlon Nov 19, 2024
432e97c
fix bug
wtmlon Nov 20, 2024
5eb201c
fix bugs when ckpt no quant and no master weight
wtmlon Nov 21, 2024
b2bcf16
remove uni-test
wtmlon Nov 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion paddlenlp/peft/lora/lora_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,9 @@
pre_tensor_parallel_split = True
tp_actions = lora_model._get_tensor_parallel_convert_actions(loaded_keys, is_split=True)
state_dict = load_state_dict(
shard_file, tp_actions if pre_tensor_parallel_split else None, expected_keys
shard_file,
tp_actions if pre_tensor_parallel_split else None,
expected_keys,

Check warning on line 267 in paddlenlp/peft/lora/lora_model.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/peft/lora/lora_model.py#L265-L267

Added lines #L265 - L267 were not covered by tests
)
error_msgs += _load_state_dict_into_model(lora_model.model, state_dict, "")
del state_dict
Expand Down
4 changes: 3 additions & 1 deletion paddlenlp/peft/prefix/prefix_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,9 @@ def from_pretrained(
pre_tensor_parallel_split = True
tp_actions = prefix_model._get_tensor_parallel_convert_actions(is_split=True)
state_dict = load_state_dict(
shard_file, tp_actions if pre_tensor_parallel_split else None, expected_keys
shard_file,
tp_actions if pre_tensor_parallel_split else None,
expected_keys,
)
error_msgs += _load_state_dict_into_model(prefix_model.prefix_encoder, state_dict, "")
del state_dict
Expand Down
364 changes: 364 additions & 0 deletions paddlenlp/quantization/checkpoint_quantization_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import numpy as np
import paddle


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重要的函数都要加上注释,同时参数的args也需要加上
对于引用的量化算法加上arvix链接

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def cal_ratio(m, v, eps=1e-8):
"""
cal part adam update ratio.
Args:
m (`paddle.Tensor`):
moment in Adam optimizer.
v (`paddle.Tensor`):
variance in Adam optimizer.
eps (`int`):
epsilon in Adam optimizer.
"""
return 1 / (np.sqrt(v) + eps)

Check warning on line 31 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L31

Added line #L31 was not covered by tests


def group_wise_quant_dequant(
inputs,
mins=None,
maxs=None,
quant_bits=4,
group_size=32,
quant=True,
tp_rank=-1,
tp_degree=1,
use_pd=False,
symmetry=False,
):
"""
group-wise quantization (support symmetry, asymmetry).
Args:
inputs (`paddle.Tensor`):
The tensor to quantize.
mins (`paddle.Tensor`):
Min scales tensor in asymmetry quantization.
maxs (`paddle.Tensor`):
Max scales tensor in asymmetry quantization, or Abs max tensor in symmetry quantization.
quant_bits (`int`):
Quantization bits.
group_size (`int`):
Group size of group-wise quantization.
quant (`bool`):
True when quantization, False in dequantization.
tp_rank (`int`):
Tensor parallel rank.
tp_degree (`int`):
Tensor parallel world size.
use_pd (`bool`):
Whether to use paddle caculation. If False will use numpy.
symmetry (`bool`):
Whether to use symmetry quantization.
"""

qmax = (1 << (quant_bits)) - 1
qmin = 0
shape = inputs.shape

Check warning on line 73 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L71-L73

Added lines #L71 - L73 were not covered by tests

if quant:
inputs_processed = inputs.reshape([shape[0] // group_size, group_size, shape[1]])
if symmetry:
bnt = (1 << (quant_bits - 1)) - 1
scales = np.max(np.abs(inputs_processed), axis=1)
new_scales = np.repeat(scales, repeats=group_size, axis=0)
quant_tensor = np.clip(np.round(inputs / new_scales * bnt), -bnt - 1, bnt)
return quant_tensor.astype("int8"), scales

Check warning on line 82 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L75-L82

Added lines #L75 - L82 were not covered by tests

# scales: [shape[0] // group_size, shape[1]]
maxs = np.max(inputs_processed, axis=1)
mins = np.min(inputs_processed, axis=1)
scales = maxs - mins

Check warning on line 87 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L85-L87

Added lines #L85 - L87 were not covered by tests
# new_scales: [shape[0], shape[1]]
new_scales = np.repeat(scales, repeats=group_size, axis=0)
new_mins = np.repeat(mins, repeats=group_size, axis=0)

Check warning on line 90 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L89-L90

Added lines #L89 - L90 were not covered by tests
# add eps to avoid devide zero
quant_tensor = np.clip(np.round((inputs - new_mins) / (new_scales) * qmax), qmin, qmax)
quant_tensor = np.nan_to_num(quant_tensor)
return quant_tensor.astype("uint8"), mins, maxs

Check warning on line 94 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L92-L94

Added lines #L92 - L94 were not covered by tests
else:
if symmetry:
scales = mins
bnt = (1 << (quant_bits - 1)) - 1
if use_pd:
new_scales = paddle.repeat_interleave(scales, group_size, 0)

Check warning on line 100 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L96-L100

Added lines #L96 - L100 were not covered by tests
else:
new_scales = np.repeat(scales, repeats=group_size, axis=0)

Check warning on line 102 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L102

Added line #L102 was not covered by tests

if tp_rank == -1:
dequant_tensor = inputs.astype("float32") * new_scales / bnt
elif len(new_scales.shape) == 0 or inputs.shape[-1] == new_scales.shape[-1]:

Check warning on line 106 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L104-L106

Added lines #L104 - L106 were not covered by tests
# input tensor was row parallel in tp.
dequant_tensor = (

Check warning on line 108 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L108

Added line #L108 was not covered by tests
inputs.astype("float32")
* new_scales[
tp_rank * new_scales.shape[0] // tp_degree : (tp_rank + 1) * new_scales.shape[0] // tp_degree
]
/ bnt
)
else:
# input tensor was column parallel in tp.
dequant_tensor = (

Check warning on line 117 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L117

Added line #L117 was not covered by tests
inputs.astype("float32")
* new_scales[
:,
tp_rank
* new_scales.shape[-1]
// tp_degree : (tp_rank + 1)
* new_scales.shape[-1]
// tp_degree,
]
/ bnt
)
return dequant_tensor

Check warning on line 129 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L129

Added line #L129 was not covered by tests

scales = maxs - mins
if use_pd:
new_scales = paddle.repeat_interleave(scales, group_size, 0)
new_mins = paddle.repeat_interleave(mins, group_size, 0)

Check warning on line 134 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L131-L134

Added lines #L131 - L134 were not covered by tests
else:
new_scales = np.repeat(scales, repeats=group_size, axis=0)
new_mins = np.repeat(mins, repeats=group_size, axis=0)

Check warning on line 137 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L136-L137

Added lines #L136 - L137 were not covered by tests

if tp_rank == -1:
dequant_tensor = (inputs.astype("float32") / qmax * new_scales) + new_mins
elif len(new_scales.shape) == 0 or inputs.shape[-1] == new_scales.shape[-1]:

Check warning on line 141 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L139-L141

Added lines #L139 - L141 were not covered by tests
# input tensor was row parallel in tp.
dequant_tensor = (

Check warning on line 143 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L143

Added line #L143 was not covered by tests
inputs.astype("float32")
/ qmax
* new_scales[
tp_rank * new_scales.shape[0] // tp_degree : (tp_rank + 1) * new_scales.shape[0] // tp_degree
]
) + new_mins[tp_rank * new_mins.shape[0] // tp_degree : (tp_rank + 1) * new_mins.shape[0] // tp_degree]
else:
# input tensor was column parallel in tp.
dequant_tensor = (

Check warning on line 152 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L152

Added line #L152 was not covered by tests
inputs.astype("float32")
/ qmax
* new_scales[
:, tp_rank * new_scales.shape[-1] // tp_degree : (tp_rank + 1) * new_scales.shape[-1] // tp_degree
]
) + new_mins[
:, tp_rank * new_mins.shape[-1] // tp_degree : (tp_rank + 1) * new_mins.shape[-1] // tp_degree
]
return dequant_tensor

Check warning on line 161 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L161

Added line #L161 was not covered by tests


def merge_int4(x, y):
"""
merge 2 signed int4 to 1 int8
Args:
x (`numpy.array`):
4bits signed int x.
y (`numpy.array`):
4bits signed int y.
"""
int4_high = x << 4
int4_low = y & 0x0F
final = int4_high | int4_low
return final.astype("int8")

Check warning on line 176 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L173-L176

Added lines #L173 - L176 were not covered by tests


def split_int8(final):
"""
split an int8 to 2 int4 elems
Args:
final (`numpy.array`):
8bits signed int.
"""
int4_high = final >> 4
int4_low = final & 0x0F

Check warning on line 187 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L186-L187

Added lines #L186 - L187 were not covered by tests

int4_high = np.where(int4_high > 8, int4_high - 16, int4_high)

Check warning on line 189 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L189

Added line #L189 was not covered by tests

high_tensor = paddle.Tensor(int4_high)
low_tensor = paddle.Tensor(int4_low)

Check warning on line 192 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L191-L192

Added lines #L191 - L192 were not covered by tests

return high_tensor, low_tensor

Check warning on line 194 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L194

Added line #L194 was not covered by tests


def cal_abs_min_max_channel(inputs, quant_axis=1):
"""
channel-wise min max scales calculation
Args:
inputs (`numpy.array`):
input tensor for quantization.
quant_axis (`int`):
dimension where calulating inputs' abs min and max scales on.
"""
eps = 1e-8
reduce_axis = tuple([i for i in range(len(inputs.shape)) if i != quant_axis])
abs_max_values = np.max(inputs, axis=reduce_axis)
abs_min_values = np.min(inputs, axis=reduce_axis)
abs_max_values = np.where(

Check warning on line 210 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L206-L210

Added lines #L206 - L210 were not covered by tests
abs_max_values == np.array(0, dtype=inputs.dtype), np.array(eps, dtype=inputs.dtype), abs_max_values
)
abs_min_values = np.where(

Check warning on line 213 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L213

Added line #L213 was not covered by tests
abs_min_values == np.array(0, dtype=inputs.dtype), np.array(eps, dtype=inputs.dtype), abs_min_values
)
return abs_max_values, abs_min_values

Check warning on line 216 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L216

Added line #L216 was not covered by tests


def asymmetry_qdq_weight(
x, quant_bit=8, quant_axis=-1, mins=None, maxs=None, dequant=False, tp_rank=-1, tp_degree=1, use_pd=False
):
"""
channel-wise asymmetry quantization
Args:
x (`paddle.Tensor`):
The tensor to quantize.
quant_bits (`int`):
Quantization bits.
quant_axis (`int`):
Scales caculation axis.
mins (`paddle.Tensor`):
Min scales tensor in asymmetry quantization.
maxs (`paddle.Tensor`):
Max scales tensor in asymmetry quantization.
dequant (`bool`):
True when dequantization, False in quantization.
tp_rank (`int`):
Model parallel rank.
tp_degree (`int`):
Model parallel world size.
use_pd (`bool`):
Whether to use paddle caculation. If False will use numpy.
"""

if mins is None:
maxs, mins = cal_abs_min_max_channel(x)
bnt = (1 << (quant_bit)) - 1
scales = maxs - mins
if not dequant:

Check warning on line 249 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L245-L249

Added lines #L245 - L249 were not covered by tests
# quant
quant_x = np.clip(np.round((x - mins) / scales * bnt), 0, bnt)
return quant_x.astype(np.uint8), mins, maxs

Check warning on line 252 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L251-L252

Added lines #L251 - L252 were not covered by tests
else:
quant_x = x

Check warning on line 254 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L254

Added line #L254 was not covered by tests
# dequant
if not use_pd:
if len(scales.shape) == 0 or quant_x.shape[-1] == scales.shape[-1]:

Check warning on line 257 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L256-L257

Added lines #L256 - L257 were not covered by tests
# input tensor was row parallel in tp.
qdq_x = (quant_x / bnt * scales) + mins

Check warning on line 259 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L259

Added line #L259 was not covered by tests
else:
# input tensor was column parallel in tp.
qdq_x = (

Check warning on line 262 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L262

Added line #L262 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些问题同qdq_weight

quant_x
/ bnt
* scales[tp_rank * scales.shape[0] // tp_degree : (tp_rank + 1) * scales.shape[0] // tp_degree]
) + mins[tp_rank * mins.shape[0] // tp_degree : (tp_rank + 1) * mins.shape[0] // tp_degree]
return qdq_x.astype(np.float32), scales

Check warning on line 267 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L267

Added line #L267 was not covered by tests
else:
if len(scales.shape) == 0 or quant_x.shape[-1] == scales.shape[-1]:

Check warning on line 269 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L269

Added line #L269 was not covered by tests
# input tensor was row parallel in tp.
qdq_x = (quant_x / bnt * scales.unsqueeze(0).expand(quant_x.shape)) + mins

Check warning on line 271 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L271

Added line #L271 was not covered by tests
else:
# input tensor was column parallel in tp.
qdq_x = (

Check warning on line 274 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L274

Added line #L274 was not covered by tests
quant_x
/ bnt
* scales[tp_rank * scales.shape[0] // tp_degree : (tp_rank + 1) * scales.shape[0] // tp_degree]
.unsqueeze(0)
.expand(quant_x.shape)
) + mins[tp_rank * mins.shape[0] // tp_degree : (tp_rank + 1) * mins.shape[0] // tp_degree]
return qdq_x.astype(paddle.float32), scales

Check warning on line 281 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L281

Added line #L281 was not covered by tests


def cal_abs_max_channel(inputs, quant_axis=1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的quant axis 为什么默认是1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number加上注释

"""
channel-wise abs max calculation
Args:
inputs (`numpy.array`):
input tensor for quantization.
quant_axis (`int`):
dimension where calulating inputs' abs max scales on.
"""
epsilon = 1e-8
reduce_axis = tuple([i for i in range(len(inputs.shape)) if i != quant_axis])
abs_max_values = np.max(np.abs(inputs), axis=reduce_axis)

Check warning on line 295 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L293-L295

Added lines #L293 - L295 were not covered by tests
# maybe all elements are zero in one group,
# so set the scales from those group to an actual number
# from divide 0.
abs_max_values = np.where(

Check warning on line 299 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L299

Added line #L299 was not covered by tests
abs_max_values == np.array(0, dtype=inputs.dtype), np.array(epsilon, dtype=inputs.dtype), abs_max_values
)
return abs_max_values

Check warning on line 302 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L302

Added line #L302 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接用1e-8来表示是不是没有考虑训练的dtype,bf16、float16、float32 表示空间不太一样

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group-wise 中一个 group 有可能全是 0,会导致量化时除 0,这里的 1e-8 是防除 0 的一个小偏置



def qdq_weight(x, quant_bit=8, quant_axis=-1, scales=None, dequant=False, tp_rank=-1, tp_degree=1, use_pd=False):
"""
channel-wise symmetry quantization
Args:
x (`paddle.Tensor`):
The tensor to quantize.
quant_bits (`int`):
Quantization bits.
quant_axis (`int`):
Scales caculation axis.
scales (`paddle.Tensor`):
Abs max scales tensor in symmetry quantization.
dequant (`bool`):
True when dequantization, False in quantization.
tp_rank (`int`):
Model parallel rank.
tp_degree (`int`):
Model parallel world size.
use_pd (`bool`):
Whether to use paddle caculation. If False will use numpy.
"""

if scales is None:
scales = cal_abs_max_channel(x)
bnt = (1 << (quant_bit - 1)) - 1
if not dequant:

Check warning on line 330 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L327-L330

Added lines #L327 - L330 were not covered by tests
# quant
quant_x = np.clip(np.round(x / scales * bnt), -bnt - 1, bnt)
return quant_x.astype(np.int8), scales

Check warning on line 333 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L332-L333

Added lines #L332 - L333 were not covered by tests
else:
quant_x = x

Check warning on line 335 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L335

Added line #L335 was not covered by tests
# dequant
if not use_pd:
if len(scales.shape) == 0 or quant_x.shape[-1] == scales.shape[-1]:

Check warning on line 338 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L337-L338

Added lines #L337 - L338 were not covered by tests
# input tensor was row parallel in tp.
qdq_x = quant_x / bnt * scales

Check warning on line 340 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L340

Added line #L340 was not covered by tests
else:
# input tensor was column parallel in tp.
qdq_x = (

Check warning on line 343 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L343

Added line #L343 was not covered by tests
quant_x
/ bnt
* scales[tp_rank * scales.shape[0] // tp_degree : (tp_rank + 1) * scales.shape[0] // tp_degree]
)
# fp32 , int8, int, fp32 or fp64
return qdq_x.astype(np.float32), scales

Check warning on line 349 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L349

Added line #L349 was not covered by tests
else:
if len(scales.shape) == 0 or quant_x.shape[-1] == scales.shape[-1]:

Check warning on line 351 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L351

Added line #L351 was not covered by tests
# input tensor was row parallel in tp.
qdq_x = quant_x / bnt * scales.unsqueeze(0).expand(quant_x.shape)

Check warning on line 353 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L353

Added line #L353 was not covered by tests
else:
# input tensor was column parallel in tp.
qdq_x = (

Check warning on line 356 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L356

Added line #L356 was not covered by tests
quant_x
/ bnt
* scales[tp_rank * scales.shape[0] // tp_degree : (tp_rank + 1) * scales.shape[0] // tp_degree]
.unsqueeze(0)
.expand(quant_x.shape)
)
# fp32 , int8, int, fp32 or fp64
return qdq_x.astype(paddle.float32), scales

Check warning on line 364 in paddlenlp/quantization/checkpoint_quantization_utils.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/quantization/checkpoint_quantization_utils.py#L364

Added line #L364 was not covered by tests
Loading
Loading