-
Notifications
You must be signed in to change notification settings - Fork 3k
Add multiprocessing #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiprocessing #552
Conversation
|
Logging looks like |
c25510f to
f62f124
Compare
|
I added tests and improved logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
A bit strange that the benchmarks on map/filter are worth than |
|
The benchmark also got worse in other PRs (see here for example, where we have 16sec for |
|
Hi, when I use the multiprocessing in I get the following error: I think you should use pathos to pickle the lambda function and some others! and it works! |
|
That's very cool indeed ! |
|
We already use |
|
it gets stuck on debian 9 when num_proc > 1 |
|
Are you using a tokenizer ? Feel free to discuss it in #620 , we're discussing this issue |
|
I set |
Adding multiprocessing to
.mapIt works in 3 steps:
num_procshardsmapon themExample of usage:
Here it writes 4 files depending on the process rank:
playground/tmp_00000_of_00004.arrowplayground/tmp_00001_of_00004.arrowplayground/tmp_00002_of_00004.arrowplayground/tmp_00003_of_00004.arrowThe suffix format can be specified by the user.
If the
cache_file_nameis not specified, it writes into separated files depending on the fingerprint, as usual.I still need to: