Skip to content

Conversation

@mariamabarham
Copy link
Contributor

@mariamabarham mariamabarham commented Jul 9, 2020

This PR adds the MS MARCO dataset as requested in this issue #336. MS mARCO has multiple task including:

  • Passage and Document Retrieval

  • Keyphrase Extraction

  • QA and NLG

This PR only adds the 2 versions of the QA and NLG task dataset which was realeased with the original paper here https://arxiv.org/pdf/1611.09268.pdf

Tests are failing because of the dummy data. I tried to fix it without success. Can you please have a look at it? @patrickvonplaten , @lhoestq

@patrickvonplaten patrickvonplaten self-assigned this Jul 10, 2020
@patrickvonplaten
Copy link
Contributor

The dummy data for v2.1 is missing as far as I can see. I think running the dummy data command should work correctly here.

@patrickvonplaten patrickvonplaten removed their assignment Jul 10, 2020
@patrickvonplaten
Copy link
Contributor

Also, it might be that the structure of the dummy data is wrong - looking at generate_examples the structure does not look too easy.

@mariamabarham
Copy link
Contributor Author

The fact that the dummy data for v2.1 is missing shouldn't make the test fails I think. But as you mention the dummy data structure of v1.1 is wrong. I tried to rename files but it does not solve the issue.

@parthplc
Copy link

Is MS mARCO added to nlp library?I am not able to view it?

@mariamabarham
Copy link
Contributor Author

Is MS mARCO added to nlp library?I am not able to view it?

Hi @parthplc ,the PR is not merged yet. The dummy data structure is still failing. Maybe @patrickvonplaten can help with it.

@patrickvonplaten
Copy link
Contributor

Dataset is fixed and should be ready for use. @mariamabarham @lhoestq feel free to merge whenever!

@mariamabarham
Copy link
Contributor Author

Dataset is fixed and should be ready for use. @mariamabarham @lhoestq feel free to merge whenever!

thanks

@mariamabarham mariamabarham merged commit e630d77 into master Aug 6, 2020
@mariamabarham mariamabarham deleted the ms_marco branch August 6, 2020 06:15
vegarab pushed a commit to vegarab/nlp that referenced this pull request Aug 18, 2020
* force push to master

* fix ms_marco

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants