Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 7, 2020

Fix #581

@lhoestq lhoestq requested a review from thomwolf September 7, 2020 16:00
Copy link
Member

@thomwolf thomwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes great, just one note on text!

If str or List[str], then the dataset returns only the 'train' split.
If dict, then keys should be from the `nlp.Split` enum.
"""
assert bool(self.config.data_files), "At least one data file must be specified, but got data_files={}".format(self.config.data_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the PR on multi-threaded text and add it in the new script before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful by putting all these checks in asserts - that's not what they should be used for. ValueErrors or filenotfound errors might be better. the issue with using asserts everywhere is that users will miss out on these error messages when they run their scripts in optimized mode, which turns off asserts (which are basically debug statements). This is true for the whole code base - using asserts to check arguments is not always what you'd want from a usability perspective.

I'd change it to:

if not self.config.data_files:
    raise ValueError(f"At least one data file must be specified, but got data_files={self.config.data_files}")

Copy link
Member Author

@lhoestq lhoestq Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed I'll change that

@lhoestq lhoestq merged commit acfc3ac into master Sep 9, 2020
@lhoestq lhoestq deleted the better-message-when-data_files-is-empty branch September 9, 2020 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better error message when input file does not exist

4 participants