-
Notifications
You must be signed in to change notification settings - Fork 3k
[Breaking] Switch text loading to multi-threaded PyArrow loading #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Awesome ! |
a236469 to
7c9da56
Compare
ce2a48f to
f11c377
Compare
|
I just rebased from master to include the hashing changes from #573 |
|
I think this is ready to merge, no? |
|
Indeed it's ready to merge :) |
|
Ok added the breaking change info and we can merge indeed. |
Test if we can get better performances for large-scale text datasets by using multi-threaded text file loading based on Apache Arrow multi-threaded CSV loader.
If it works ok, it would fix #546.
Breaking change:
The text lines now do not include final line-breaks anymore.