|
| 1 | +--- |
| 2 | +layout: model |
| 3 | +title: Clinical Deidentification Pipeline - Multi Mode Output (English) |
| 4 | +author: John Snow Labs |
| 5 | +name: clinical_deidentification_multi_mode_output |
| 6 | +date: 2024-03-27 |
| 7 | +tags: [deidentification, deid, clinical, pipeline, en, licensed] |
| 8 | +task: De-identification |
| 9 | +language: en |
| 10 | +edition: Healthcare NLP 5.3.1 |
| 11 | +spark_version: 3.4 |
| 12 | +supported: true |
| 13 | +annotator: PipelineModel |
| 14 | +article_header: |
| 15 | + type: cover |
| 16 | +use_language_switcher: "Python-Scala-Java" |
| 17 | +--- |
| 18 | + |
| 19 | +## Description |
| 20 | + |
| 21 | +This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `EMAIL`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities. |
| 22 | + |
| 23 | +This pipeline simultaneously produces masked with entity labels, fixed-length char, same-length char and obfuscated version of the text. |
| 24 | + |
| 25 | +{:.btn-box} |
| 26 | +<button class="button button-orange" disabled>Live Demo</button> |
| 27 | +<button class="button button-orange" disabled>Open in Colab</button> |
| 28 | +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_multi_mode_output_en_5.3.1_3.4_1711532922696.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} |
| 29 | +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_multi_mode_output_en_5.3.1_3.4_1711532922696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} |
| 30 | + |
| 31 | +## How to use |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +<div class="tabs-box" markdown="1"> |
| 36 | +{% include programmingLanguageSelectScalaPythonNLU.html %} |
| 37 | + |
| 38 | +```python |
| 39 | +from sparknlp.pretrained import PretrainedPipeline |
| 40 | + |
| 41 | +deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models") |
| 42 | + |
| 43 | +text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435. |
| 44 | +Dr. John Green, ID: 1231511863, IP 203.120.223.13. |
| 45 | +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. |
| 46 | +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. |
| 47 | +Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: [email protected].""" |
| 48 | + |
| 49 | +result = deid_pipeline.annotate(text) |
| 50 | +``` |
| 51 | +```scala |
| 52 | +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline |
| 53 | + |
| 54 | +val deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models") |
| 55 | + |
| 56 | +val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435. |
| 57 | +Dr. John Green, ID: 1231511863, IP 203.120.223.13. |
| 58 | +He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. |
| 59 | +Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. |
| 60 | +Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: [email protected].""" |
| 61 | + |
| 62 | +val result = deid_pipeline.annotate(text) |
| 63 | +``` |
| 64 | +</div> |
| 65 | + |
| 66 | +## Results |
| 67 | + |
| 68 | +```bash |
| 69 | +print("\nMasked with entity labels") |
| 70 | +print("-"*30) |
| 71 | +print("\n".join(result['masked'])) |
| 72 | +print("\nMasked with chars") |
| 73 | +print("-"*30) |
| 74 | +print("\n".join(result['masked_with_chars'])) |
| 75 | +print("\nMasked with fixed length chars") |
| 76 | +print("-"*30) |
| 77 | +print("\n".join(result['masked_fixed_length_chars'])) |
| 78 | +print("\nObfuscated") |
| 79 | +print("-"*30) |
| 80 | +print("\n".join(result['obfuscated'])) |
| 81 | + |
| 82 | +Masked with entity labels |
| 83 | +------------------------------ |
| 84 | +Name : <PATIENT>, Record date: <DATE>, MR # <MEDICALRECORD>. |
| 85 | +Dr. <DOCTOR>, ID: <DEVICE>, IP <IPADDR>. |
| 86 | +He is a <AGE>-year-old male was admitted to the <HOSPITAL> for cystectomy on <DATE>. |
| 87 | +Patient's VIN : <VIN>, SSN <SSN>, Driver's license no: <DLN>. |
| 88 | +Phone <PHONE>, <STREET>, <CITY>, E-MAIL: <EMAIL>. |
| 89 | + |
| 90 | +Masked with chars |
| 91 | +------------------------------ |
| 92 | +Name : [**************], Record date: [********], MR # [****]. |
| 93 | +Dr. [********], ID: [********], IP [************]. |
| 94 | +He is a **-year-old male was admitted to the [**********] for cystectomy on [******]. |
| 95 | +Patient's VIN : [***************], SSN [**********], Driver's license no: [******]. |
| 96 | +Phone [************], [***************], [***********], E-MAIL: [*************]. |
| 97 | + |
| 98 | +Masked with fixed length chars |
| 99 | +------------------------------ |
| 100 | +Name : ****, Record date: ****, MR # ****. |
| 101 | +Dr. ****, ID: ****, IP ****. |
| 102 | +He is a ****-year-old male was admitted to the **** for cystectomy on ****. |
| 103 | +Patient's VIN : ****, SSN ****, Driver's license no: ****. |
| 104 | +Phone ****, ****, ****, E-MAIL: ****. |
| 105 | + |
| 106 | +Obfuscated |
| 107 | +------------------------------ |
| 108 | +Name : Marlana Salvage, Record date: 2093-02-23, MR # 824235. |
| 109 | +Dr. Vic Blackbird, ID: X2814358, IP 001.001.001.001. |
| 110 | +He is a 68-year-old male was admitted to the PRAIRIE SAINT JOHN'S for cystectomy on 02/23/93. |
| 111 | +Patient's VIN : 3IRWE31VQMG867619, SSN #509-32-6712, Driver's license no: W580998P. |
| 112 | +Phone (382) 505-3976, 521 Adams St, Port Shannon, E-MAIL: [email protected]. |
| 113 | +``` |
| 114 | + |
| 115 | +{:.model-param} |
| 116 | +## Model Information |
| 117 | + |
| 118 | +{:.table-model} |
| 119 | +|---|---| |
| 120 | +|Model Name:|clinical_deidentification_multi_mode_output| |
| 121 | +|Type:|pipeline| |
| 122 | +|Compatibility:|Healthcare NLP 5.3.1+| |
| 123 | +|License:|Licensed| |
| 124 | +|Edition:|Official| |
| 125 | +|Language:|en| |
| 126 | +|Size:|1.7 GB| |
| 127 | + |
| 128 | +## Included Models |
| 129 | + |
| 130 | +- DocumentAssembler |
| 131 | +- SentenceDetectorDLModel |
| 132 | +- TokenizerModel |
| 133 | +- WordEmbeddingsModel |
| 134 | +- MedicalNerModel |
| 135 | +- NerConverter |
| 136 | +- MedicalNerModel |
| 137 | +- NerConverter |
| 138 | +- ChunkMergeModel |
| 139 | +- ContextualParserModel |
| 140 | +- ContextualParserModel |
| 141 | +- ContextualParserModel |
| 142 | +- ContextualParserModel |
| 143 | +- ContextualParserModel |
| 144 | +- ContextualParserModel |
| 145 | +- TextMatcherModel |
| 146 | +- ContextualParserModel |
| 147 | +- RegexMatcherModel |
| 148 | +- ContextualParserModel |
| 149 | +- ContextualParserModel |
| 150 | +- ContextualParserModel |
| 151 | +- ContextualParserModel |
| 152 | +- ChunkMergeModel |
| 153 | +- ChunkMergeModel |
| 154 | +- DeIdentificationModel |
| 155 | +- DeIdentificationModel |
| 156 | +- DeIdentificationModel |
| 157 | +- DeIdentificationModel |
| 158 | +- Finisher |
0 commit comments