-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[Done] Sync master client between passes and fix recordio split #2948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
typhoonzero
merged 22 commits into
PaddlePaddle:develop
from
typhoonzero:fix_recordio_split
Jul 27, 2017
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
6cea7ba
fix recordio split and task passes
c950b73
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
0391bf5
update for pre commit
5a402b5
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
30adaa8
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
56309b2
update
9215501
update, still need to sync client wait for pass end.
e3d7c22
able to sync passes for task dispatching
419d553
update to comment
31bf3fb
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
bd2a610
update
149ced5
fix yapf check
ec6b16e
why local pre-commit fails? version is the same
270bdb3
fix race condition
8c0755b
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
9891627
update
c1e8c9b
fix race condition
f07dc95
this still have duplicate problem in unit test
fb9f810
update
7d2d744
update
cc45124
update by comment
ebb007f
update
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does trainer know which pass is the training currently in? We need someway to get this information, maybe
paddle_set_datasetcould return this information, or creating a new function seems fine as well.Btw, this function is not called from anywhere is this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trainer don't need to know the exact master side pass count, use local pass count will do the work. Trainer just needs to know when to break and when to wait.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say if the job trains on pass 100, and one trainer gets killed and started again, should it set local pass to 0 and start from there (call get record and get error, increment pass number by 1, and call again until reaching pass 100)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. That's right, this just works, I'll refine this in another PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sure.