Skip to content

Conversation

@DrownFish19
Copy link
Collaborator

PR types

New features

PR changes

Models

Description

Add DeepseekV3.

  1. Add the DeepseekV3 modeling.
  2. update the order of auto tokenizer and update related tokenizers.

@codecov
Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 74.75410% with 77 lines in your changes missing coverage. Please review.

Project coverage is 52.38%. Comparing base (1d74d62) to head (4d61571).
Report is 262 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/deepseek_v2/modeling.py 7.54% 49 Missing ⚠️
paddlenlp/transformers/deepseek_v3/modeling.py 50.94% 26 Missing ⚠️
...addlenlp/transformers/deepseek_v2/configuration.py 0.00% 1 Missing ⚠️
...addlenlp/transformers/deepseek_v3/configuration.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9738      +/-   ##
===========================================
+ Coverage    52.35%   52.38%   +0.02%     
===========================================
  Files          729      730       +1     
  Lines       117835   115230    -2605     
===========================================
- Hits         61694    60359    -1335     
+ Misses       56141    54871    -1270     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

from .bit.modeling import *
from .bit.configuration import *
from .bit.image_processing import *
from .artist.configuration import *
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据名称重新排序,并新增deepseekv2\v3相关import

@DrownFish19 DrownFish19 closed this Jan 8, 2025
@DrownFish19 DrownFish19 force-pushed the dev_20241231_add_deepseekv3 branch from 31a383a to 1d74d62 Compare January 8, 2025 11:40
@DrownFish19 DrownFish19 reopened this Jan 8, 2025
if x.endswith(key):
state_keys_map[key] = x
break
# break # remove break for math A.key B.key ...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处避免模型参数具有相同后缀,无法拿到TPAction的情况

ZHUI
ZHUI previously approved these changes Jan 8, 2025
Copy link
Contributor

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


class DeepseekV3PretrainedModel(DeepseekV2PretrainedModel):
config_class = DeepseekV2Config
base_model_prefix = "deepseek_v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们都继承了,要不 base_model_prefix 改成hf一样?参数不好处理的话,就算了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 参数比较好处理,重写一下就行
  2. base_model_prefix = "model" 能节省很多代码,后续的模型直接继承CausalLM就可以,不用从DeepseekV3PretrainedModel开始修改

y = paddle.to_tensor(paddle.finfo(dtype).min, dtype=dtype)
expanded_attn_mask = expanded_attn_mask.astype(dtype)
expanded_attn_mask = paddle.where(expanded_attn_mask, x, y).astype(dtype)
y = paddle.to_tensor(-1.7005809656952787e38, dtype="float32")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是?

@ZHUI ZHUI merged commit 2c556e7 into PaddlePaddle:develop Jan 10, 2025
8 of 12 checks passed
@zhaoyang-star
Copy link

@DrownFish19 @ZHUI 请问Paddle目前已经支持MTP类型的模型训练了吗?如果支持的话,有单测或demo吗?谢谢!

@DrownFish19 DrownFish19 deleted the dev_20241231_add_deepseekv3 branch February 14, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants