[LLM] Add DeepseekV3 #9738

DrownFish19 · 2025-01-03T06:37:33Z

PR types

New features

PR changes

Models

Description

Add DeepseekV3.

Add the DeepseekV3 modeling.
update the order of auto tokenizer and update related tokenizers.

codecov · 2025-01-03T07:10:21Z

Codecov Report

Attention: Patch coverage is 74.75410% with 77 lines in your changes missing coverage. Please review.

Project coverage is 52.38%. Comparing base (1d74d62) to head (4d61571).
Report is 262 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/deepseek_v2/modeling.py	7.54%	49 Missing ⚠️
paddlenlp/transformers/deepseek_v3/modeling.py	50.94%	26 Missing ⚠️
...addlenlp/transformers/deepseek_v2/configuration.py	0.00%	1 Missing ⚠️
...addlenlp/transformers/deepseek_v3/configuration.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9738      +/-   ##
===========================================
+ Coverage    52.35%   52.38%   +0.02%     
===========================================
  Files          729      730       +1     
  Lines       117835   115230    -2605     
===========================================
- Hits         61694    60359    -1335     
+ Misses       56141    54871    -1270

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DrownFish19 · 2025-01-07T03:34:50Z

paddlenlp/transformers/__init__.py

-from .bit.modeling import *
-from .bit.configuration import *
-from .bit.image_processing import *
+from .artist.configuration import *


根据名称重新排序，并新增deepseekv2\v3相关import

…PaddleNLP into dev_20241231_add_deepseekv3

DrownFish19 · 2025-01-08T11:50:14Z

paddlenlp/transformers/conversion_utils.py

                if x.endswith(key):
                    state_keys_map[key] = x
-                    break
+                    # break # remove break for math A.key B.key ...


此处避免模型参数具有相同后缀，无法拿到TPAction的情况

ZHUI

LGTM

ZHUI · 2025-01-08T11:56:20Z

paddlenlp/transformers/deepseek_v3/modeling.py

+
+class DeepseekV3PretrainedModel(DeepseekV2PretrainedModel):
+    config_class = DeepseekV2Config
+    base_model_prefix = "deepseek_v3"


咱们都继承了，要不 base_model_prefix 改成hf一样？参数不好处理的话，就算了

参数比较好处理，重写一下就行

base_model_prefix = "model" 能节省很多代码，后续的模型直接继承CausalLM就可以，不用从DeepseekV3PretrainedModel开始修改

ZHUI · 2025-01-10T06:11:42Z

paddlenlp/transformers/deepseek_v2/modeling.py

-            y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype)
-            expanded_attn_mask = expanded_attn_mask.astype(dtype)
-            expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype)
+            y = paddle.to_tensor(-1.7005809656952787e38, dtype="float32")


这个是？

zhaoyang-star · 2025-02-11T08:42:26Z

@DrownFish19 @ZHUI 请问Paddle目前已经支持MTP类型的模型训练了吗？如果支持的话，有单测或demo吗？谢谢！

DrownFish19 added 2 commits December 31, 2024 09:22

update deepseek-v2

bcdcab2

add deepseek_v3

4a2be33

Merge branch 'PaddlePaddle:develop' into dev_20241231_add_deepseekv3

99c4def

DrownFish19 commented Jan 7, 2025

View reviewed changes

DrownFish19 added 4 commits January 7, 2025 03:47

update for deepseekv3

8d11a23

update prepare_inputs_for_generation

d89d6d4

update for predict

20bd1ea

update TP and model load

f9abe9c

DrownFish19 closed this Jan 8, 2025

DrownFish19 force-pushed the dev_20241231_add_deepseekv3 branch from 31a383a to 1d74d62 Compare January 8, 2025 11:40

Merge branch 'dev_20241231_add_deepseekv3' of github.com:DrownFish19/…

cac02e4

…PaddleNLP into dev_20241231_add_deepseekv3

DrownFish19 reopened this Jan 8, 2025

DrownFish19 commented Jan 8, 2025

View reviewed changes

ZHUI previously approved these changes Jan 8, 2025

View reviewed changes

update deepseekv3 model_ids

4d61571

DrownFish19 dismissed ZHUI’s stale review via 4d61571 January 10, 2025 02:29

ZHUI reviewed Jan 10, 2025

View reviewed changes

ZHUI merged commit 2c556e7 into PaddlePaddle:develop Jan 10, 2025
8 of 12 checks passed

DrownFish19 deleted the dev_20241231_add_deepseekv3 branch February 14, 2025 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLM] Add DeepseekV3 #9738

[LLM] Add DeepseekV3 #9738

DrownFish19 commented Jan 3, 2025

Uh oh!

codecov bot commented Jan 3, 2025 •

edited

Loading

Uh oh!

DrownFish19 Jan 7, 2025

Uh oh!

DrownFish19 Jan 8, 2025

Uh oh!

ZHUI left a comment

Uh oh!

ZHUI Jan 8, 2025

Uh oh!

DrownFish19 Jan 8, 2025

Uh oh!

ZHUI Jan 10, 2025

Uh oh!

Uh oh!

zhaoyang-star commented Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[LLM] Add DeepseekV3 #9738

[LLM] Add DeepseekV3 #9738

Conversation

DrownFish19 commented Jan 3, 2025

PR types

PR changes

Description

Uh oh!

codecov bot commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DrownFish19 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

DrownFish19 Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

ZHUI Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

DrownFish19 Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

ZHUI Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaoyang-star commented Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 3, 2025 •

edited

Loading