Skip to content

Conversation

@Dtphelan1
Copy link
Contributor

Summary

Adds behavior to the CSVModule to support normalization of NULL/NIL values into empty-string values. Additionally, CSVExtractors can now specify unalterableColumns in order to leave certain types of data immune from this normalization.

New behavior

  • CSVModule now preprocesses CSV data in order to replace any values where value.toLowerCase() === 'null' || value.toLowerCase() === 'nil' with empty string ''.

Code changes

  • CSVPatientExtractor now defines two unalterableColumns, familyName and givenName.
  • BaseCSVExtractor now accepts unalterableColumns as a constructor argument and stores that value in local state.
  • Adds tests of this new normalization functionality

Testing guidance

  • Ensure current tests pass.
  • Try adding a NULL value to a CSV and see if it is appropriately replaced with an empty string (one option: replace gender with NULL/NIL in patient-information.csv and see if the dataAbsentReason appears in the output bundle)
  • Ensure that givenName and familyName aren't scrubbed, even if their values are NULL/NIL.

@jafeltra
Copy link
Contributor

@Dtphelan1 just noticed this is going into main. Probably want this to go to develop.

Copy link
Contributor

@julianxcarter julianxcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Dtphelan1 Dtphelan1 changed the base branch from main to develop May 19, 2021 18:18
@jafeltra
Copy link
Contributor

Thanks for that change! Not to continue being a backseat reviewer (I just am noticing these things while making release notes), but did we want to make a note of what we are doing in the README? I know we talked about it, but maybe we decided against this since we evaluated which columns should be manipulated.

@dmendelowitz
Copy link
Contributor

I went in and changed a patients given name to "nil" to make sure it didn't get wiped but their name ended up being output as "null" which seems very weird. 1. the name isn't supposed to be changed and 2. how did it go from nil to null?

@Dtphelan1
Copy link
Contributor Author

  1. looks like it might have to do with caseSensitivity issues – when we create our headers, we lowercase them all. So familyName and familyname won't match. Handling this now 😢

@Dtphelan1
Copy link
Contributor Author

@dmendelowitz should have taken care of 2. Going to look into 1) now...

@Dtphelan1
Copy link
Contributor Author

I don't seem to be able to replicate 1). Could you send me some screen grabs showing me what you were seeing exactly?

@dmendelowitz
Copy link
Contributor

nil_doe
result

@dmendelowitz
Copy link
Contributor

When the name was "Jane" in the csv it properly output "Jane Doe"

@dmendelowitz
Copy link
Contributor

It looks like your most recent commit fixed the "nil" -> "null" thing

@jafeltra
Copy link
Contributor

Ohhhh the null Doe thing might be my fault. I think when I updated the patient CSV to not require family or given name, I don't think I updated the template where it combines the two into that one text field. So I think the same thing would probably happen if you didn't include any value for givenName as well.

@dmendelowitz
Copy link
Contributor

@jafeltra Yeah, when givenName is blank, I get "null Doe" also, so I guess that's an issue with the patient template not @Dtphelan1's new code

Copy link
Contributor

@dmendelowitz dmendelowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I all my issues have been resolved

@dmendelowitz
Copy link
Contributor

I know I already approved but this thought just came to my mind. Since I just learned that the patient IDs csv is parsed without the CSVModule, is there any chance that that CSV will have a nil or null that needs to be interpreted as empty? (I hope not)

@Dtphelan1
Copy link
Contributor Author

  1. Great question – I don't think we need to worry about this right now;
  2. This does raise a bigger question about whether or not the client's should be aware of patient-mrns more generally. Not a problem for now, but something to think about more moving forward 😄

@jafeltra
Copy link
Contributor

The README update makes sense to me!

@Dtphelan1 Dtphelan1 merged commit 8ef319f into develop May 20, 2021
@Dtphelan1 Dtphelan1 deleted the null-is-empty-csv branch May 20, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants