-
Notifications
You must be signed in to change notification settings - Fork 3k
add MS MARCO dataset #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add MS MARCO dataset #364
Conversation
|
The dummy data for v2.1 is missing as far as I can see. I think running the dummy data command should work correctly here. |
|
Also, it might be that the structure of the dummy data is wrong - looking at |
|
The fact that the dummy data for v2.1 is missing shouldn't make the test fails I think. But as you mention the dummy data structure of v1.1 is wrong. I tried to rename files but it does not solve the issue. |
|
Is MS mARCO added to nlp library?I am not able to view it? |
Hi @parthplc ,the PR is not merged yet. The dummy data structure is still failing. Maybe @patrickvonplaten can help with it. |
9be90ad to
888852d
Compare
|
Dataset is fixed and should be ready for use. @mariamabarham @lhoestq feel free to merge whenever! |
thanks |
* force push to master * fix ms_marco Co-authored-by: Patrick von Platen <[email protected]>
This PR adds the MS MARCO dataset as requested in this issue #336. MS mARCO has multiple task including:
Passage and Document Retrieval
Keyphrase Extraction
QA and NLG
This PR only adds the 2 versions of the QA and NLG task dataset which was realeased with the original paper here https://arxiv.org/pdf/1611.09268.pdf
Tests are failing because of the dummy data. I tried to fix it without success. Can you please have a look at it? @patrickvonplaten , @lhoestq