Skip to content

Conversation

@DrownFish19
Copy link
Collaborator

@DrownFish19 DrownFish19 commented Jan 6, 2025

PR types

Bug fixes

PR changes

APIs

Description

Fix AutoTokenizer. #9726


AutoTokenizer自动初始化在找相关class时候使用错误的module位置


original:

>>> from paddlenlp.transformers import AutoTokenizer
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
>>> tokenizer = AutoTokenizer.from_pretrained('DeepFloyd/t5-v1_1-xxl')
[2025-01-02 16:33:31,091] [    INFO] - Loading configuration file /root/.paddlenlp/models/DeepFloyd/t5-v1_1-xxl/config.json
Traceback (most recent call last):
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 35, in getattribute_from_module
    return getattribute_from_module(paddlenlp_module, attr)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 39, in getattribute_from_module
    raise ValueError(f"Could not find {attr} in {paddlenlp_module}!")
ValueError: Could not find T5Tokenizer in <module 'paddlenlp' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/__init__.py'>!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/tokenizer.py", line 454, in from_pretrained
    tokenizer_class_py = TOKENIZER_MAPPING[type(config)]
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 69, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 100, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 37, in getattribute_from_module
    raise ValueError(f"Could not find {attr} neither in {module} nor in {paddlenlp_module}!")
ValueError: Could not find T5Tokenizer neither in <module 'paddlenlp.transformers.t5' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/t5/__init__.py'> nor in <module 'paddlenlp' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/__init__.py'>!

fixed:

>>> from paddlenlp.transformers import AutoTokenizer
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
>>> tokenizer = AutoTokenizer.from_pretrained('DeepFloyd/t5-v1_1-xxl')
[2025-01-02 16:29:06,221] [    INFO] - Loading configuration file /root/.paddlenlp/models/DeepFloyd/t5-v1_1-xxl/config.json
>>> print(type(tokenizer))
<class 'paddlenlp.transformers.t5.tokenizer.T5Tokenizer'>

@codecov
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 52.21%. Comparing base (7e06e02) to head (3032955).
Report is 15 commits behind head on develop.

Current head 3032955 differs from pull request most recent head 6f77eb5

Please upload reports for the commit 6f77eb5 to get more accurate results.

Files with missing lines Patch % Lines
paddlenlp/trainer/unified_checkpoint/load_local.py 0.00% 2 Missing ⚠️
paddlenlp/transformers/auto/tokenizer.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9745      +/-   ##
===========================================
- Coverage    52.61%   52.21%   -0.41%     
===========================================
  Files          723      723              
  Lines       114678   114332     -346     
===========================================
- Hits         60341    59699     -642     
- Misses       54337    54633     +296     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@paddle-bot
Copy link

paddle-bot bot commented Jan 6, 2025

Thanks for your contribution!

@DrownFish19 DrownFish19 force-pushed the dev_20250106_update_auto_tokenizer branch from 3032955 to 6f77eb5 Compare January 8, 2025 11:40
@ZHUI ZHUI merged commit cf5e3e7 into PaddlePaddle:develop Jan 10, 2025
11 of 14 checks passed
@DrownFish19 DrownFish19 deleted the dev_20250106_update_auto_tokenizer branch February 7, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants