Skip to content

Conversation

@DesmonDay
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Description

  1. Optimize implementation of distributed dataloader next function.
  2. Add paddle.utils.nested

@paddle-bot
Copy link

paddle-bot bot commented May 7, 2024

Thanks for your contribution!

@DesmonDay DesmonDay force-pushed the update_distloader branch from a1378a3 to ac3f1aa Compare May 7, 2024 12:53
ZHUI
ZHUI previously approved these changes May 7, 2024
data = None
if self._need_data:
try:
data = next(self._dataloader_iter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy可以放到这里

Suggested change
data = next(self._dataloader_iter)
data = next(self._dataloader_iter)
data = nested_copy_place(data, place=paddle.framework._current_expected_place())

nested_reduce_tensor,
)

_MAX_DATA_DIM = 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除吧,应该没用

@DesmonDay DesmonDay force-pushed the update_distloader branch from dabda13 to 84b4bf7 Compare May 8, 2024 10:55
num_workers=self.args.dataloader_num_workers,
)
if self.args.distributed_dataloader:
return _DataLoader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然都这样了,这里就直接 DistDataLoader

删掉_DataLoader = DistDataLoader if self.args.distributed_dataloader else DataLoader

ZHUI
ZHUI previously approved these changes May 8, 2024
@codecov
Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 13.39286% with 97 lines in your changes are missing coverage. Please review.

Project coverage is 55.43%. Comparing base (d6ac1bd) to head (14148f2).

Files Patch % Lines
paddlenlp/data/dist_dataloader.py 6.00% 47 Missing ⚠️
paddlenlp/utils/nested.py 18.00% 41 Missing ⚠️
paddlenlp/trainer/trainer.py 10.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8380      +/-   ##
===========================================
+ Coverage    55.42%   55.43%   +0.01%     
===========================================
  Files          615      616       +1     
  Lines        96235    96209      -26     
===========================================
+ Hits         53335    53336       +1     
+ Misses       42900    42873      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wawltor wawltor merged commit a139758 into PaddlePaddle:develop May 9, 2024
DesmonDay added a commit to DesmonDay/PaddleNLP that referenced this pull request May 13, 2024
)

* Fix sharding overlap bug

* [DistDataloader] Update implementation, add nested.py

* Fix pipeline

* add first try

* update dataloader

---------

Co-authored-by: lugimzzz <[email protected]>
DesmonDay added a commit to DesmonDay/PaddleNLP that referenced this pull request May 13, 2024
)

* Fix sharding overlap bug

* [DistDataloader] Update implementation, add nested.py

* Fix pipeline

* add first try

* update dataloader

---------

Co-authored-by: lugimzzz <[email protected]>
DesmonDay added a commit that referenced this pull request May 13, 2024
* [DistDataloader] Update implementation, add nested.py (#8380)
* fix distdataloader, fix eval with dp group (#8420)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants