-
Notifications
You must be signed in to change notification settings - Fork 2
CSV Validation #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV Validation #91
Conversation
jafeltra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good to me! I just had a couple questions:
- [JA1] When I remove a column, I get a logged message that says
...found extra column(s): "". Is that expected? This happens both if the column is marked as required or not. I think we had said if we had missing non-required columns we wanted to flag the CSV as invalid, but I'm not sure and wanted to check. Either way, if we are missing columns, it might be helpful to update the error message we get in this case since it reference extra columns, but we're missing columns. - [JA2] When I add an extra column but don't add a value in the row (or vice versa), I just get an error that the extractor cannot be initialized, but no message from the validation. Is that expected? It looks like the error that gets caught when we log that message says that the column length and row length is off, so it might be helpful to log that information.
- [JA3] This might not be possible with the approach you've taken, so I'm just asking to see your/others opinion on this. But if one CSV file fails validation, do you think we should continue with extraction for the other extractors that did have valid CSV files? Or does it make sense to stop all extraction? Maybe there's a case to make that we might be missing resources we try to reference, but we also don't enforce any sort of context right now with the CSV extractors.
- [JA4] I'm not sure exactly what you updated in the Excel file, but it looks like some of the required columns don't match up with what is in the schema for some of the resources. Is that something you were trying to align as part of this?
|
DP1 - Looks like this PR hasn't yet added tests for the CSV Validator. I think as you're going through and addressing the JA's above, creating tests that explicitly define expected behavior would be helpful. |
My 2 Cents: I would want to continue the validation and aggregate all the errors. Since this is pre-extraction, it will be most helpful to let the user know about all errors found in validation, so they can go back to their POC and inform them of all the issues in validation that need addressing.. |
Ah that's a good idea! Do you think that in addition to seeing all the errors that are there, would we want to continue with extraction for the ones that did validate? Or just stop and be able to see the errors that were there? |
I would think the client should shut down if the CSV's don't validate without errors. That's my thought anyway. |
|
Methinks all comments resolved |
jafeltra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes look good to me! I tested out the various cases for missing and required/ missing and optional/extra and they all work as expected. And the schemas, extractors, and Excel file all line up!
|
Looks like this could use a rebase. |
* Include schemas and tools for validating CSVs * Add BaseCSVExtractor class that all CSV extractors extend * Update excel sheet to indicate all fields used * Update tests to use proper calls
75eaf06 to
bb59921
Compare
|
Looks great. |
Summary
Introduces the concept of validating CSV schemas for the specific extractor (e.g. column names, number of columns, required columns, etc.).
CSV Validation is done at initialization time of the Extractor (before any data is extracted for anything).
NOTE: Schemas in this PR only exist for the ICARE data elements per the task definition of done.
New behavior
If any validation errors occur, they will be logged to the console and the extraction process will stop. Otherwise, logs will indicate that the validation has succeeded, and extraction will continue. Actual extraction behavior is unchanged
Code changes
src/helpers/schemas/csv.js) for schemas. They are simple for now (just column names and optional vs. not). We can add more complex validation in the future but IMO that should be separate tasking.BaseCSVExtractorclass that all CSV extractors extend frominitializeExtractorsin BaseClientTesting guidance