Skip to content

Conversation

@ymcui
Copy link
Owner

@ymcui ymcui commented Aug 23, 2023

Description

This PR updates PPLs for GGUF k-quant models.

Details

llama.cpp has introduced new GGUF format and a recent PR brings improvements in k-quant model series. This PR is typically meant for refreshing the k-quant models' stats to see if our model can also benefit from these improvements.

Chinese-LLaMA-2-7B

old:

F16 Q2_K Q3_K Q4_0 Q4_1 Q4_K Q5_0 Q5_1 Q5_K Q6_K Q8_0
PPL 9.128 13.640 9.910 9.476 9.576 9.257 9.156 9.213 9.141 9.143 9.129
Size 12.91G 2.77G 3.17G 3.69G 4.08G 3.92G 4.47G 4.86G 4.59G 5.30G 6.81G
CPU Speed 117 42 51 39 44 43 48 51 50 54 65
GPU Speed 53 19 21 17 18 20 x x 25 26 x

new (changed results are in boldface):

F16 🆕Q2_K 🆕Q3_K Q4_0 Q4_1 🆕Q4_K Q5_0 Q5_1 🆕Q5_K 🆕Q6_K Q8_0
PPL 9.128 11.1073 9.5760 9.476 9.576 9.2397 9.156 9.213 9.1676 9.1329 9.129
Size 12.91G 2.41G 3.18G 3.69G 4.08G 3.92G 4.47G 4.86G 4.59G 5.30G 6.81G
CPU Speed 117 42 51 39 44 43 48 51 50 54 65
GPU Speed 53 19 21 17 18 20 x x 25 26 x

Chinese-LLaMA-2-13B

old:

F16 Q2_K Q3_K Q4_0 Q4_1 Q4_K Q5_0 Q5_1 Q5_K Q6_K Q8_0
PPL 8.810 14.84 9.834 9.371 9.549 8.958 8.988 8.924 8.850 8.817 8.811
Size 24.69G 5.26G 6.02G 7.01G 7.77G 7.48G 8.52G 9.28G 8.76G 10.13G 13.05G
CPU Speed - 75 90 76 80 80 91 99 92 104 125
GPU Speed - 31 37 30 32 36 x x 47 51 x

new (changed results are in boldface):

F16 🆕Q2_K 🆕Q3_K Q4_0 Q4_1 🆕Q4_K Q5_0 Q5_1 🆕Q5_K 🆕Q6_K Q8_0
PPL 8.810 12.8040 9.7383 9.371 9.549 8.9522 8.988 8.924 8.8581 8.820 8.811
Size 24.69G 5.18G 6.04G 7.01G 7.77G 7.48G 8.52G 9.28G 8.76G 10.13G 13.05G
CPU Speed - 75 90 76 80 80 91 99 92 104 125
GPU Speed - 31 37 30 32 36 x x 47 51 x

Observations:

  1. All k-quant models' PPL have been improved (reduced), except for q5_k. Significant improvement in q2_k and q3_k.
  2. q2_k model also have a more compact size than before. For example, 7B-q2_k model was 2.77G, and now is 2.41G.

⚠️ WARNING: You should re-convert these models to avoid unexpected results and behavior.

Related Issue

None.

Explanation of Changes

copilot:walkthrough

- mainly update k-quant series
@ymcui ymcui merged commit 8f682ad into main Aug 23, 2023
@ymcui ymcui deleted the llama-cpp-gguf branch August 23, 2023 02:57
@ymcui ymcui restored the llama-cpp-gguf branch August 23, 2023 02:57
@ymcui ymcui deleted the llama-cpp-gguf branch August 30, 2023 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants