Skip to content

Commit b6d805d

Browse files
authored
2024-03-27-clinical_deidentification_multi_mode_output_en (#1089)
1 parent 61c19ea commit b6d805d

File tree

1 file changed

+158
-0
lines changed

1 file changed

+158
-0
lines changed
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
---
2+
layout: model
3+
title: Clinical Deidentification Pipeline - Multi Mode Output (English)
4+
author: John Snow Labs
5+
name: clinical_deidentification_multi_mode_output
6+
date: 2024-03-27
7+
tags: [deidentification, deid, clinical, pipeline, en, licensed]
8+
task: De-identification
9+
language: en
10+
edition: Healthcare NLP 5.3.1
11+
spark_version: 3.4
12+
supported: true
13+
annotator: PipelineModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This pipeline can be used to de-identify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `EMAIL`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities.
22+
23+
This pipeline simultaneously produces masked with entity labels, fixed-length char, same-length char and obfuscated version of the text.
24+
25+
{:.btn-box}
26+
<button class="button button-orange" disabled>Live Demo</button>
27+
<button class="button button-orange" disabled>Open in Colab</button>
28+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_multi_mode_output_en_5.3.1_3.4_1711532922696.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
29+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_multi_mode_output_en_5.3.1_3.4_1711532922696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
30+
31+
## How to use
32+
33+
34+
35+
<div class="tabs-box" markdown="1">
36+
{% include programmingLanguageSelectScalaPythonNLU.html %}
37+
38+
```python
39+
from sparknlp.pretrained import PretrainedPipeline
40+
41+
deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
42+
43+
text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
44+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
45+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
46+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
47+
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: [email protected]."""
48+
49+
result = deid_pipeline.annotate(text)
50+
```
51+
```scala
52+
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
53+
54+
val deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")
55+
56+
val text = """Name : Hendrickson, Ora, Record date: 2093-01-13, MR # 719435.
57+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
58+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
59+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B.
60+
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: [email protected]."""
61+
62+
val result = deid_pipeline.annotate(text)
63+
```
64+
</div>
65+
66+
## Results
67+
68+
```bash
69+
print("\nMasked with entity labels")
70+
print("-"*30)
71+
print("\n".join(result['masked']))
72+
print("\nMasked with chars")
73+
print("-"*30)
74+
print("\n".join(result['masked_with_chars']))
75+
print("\nMasked with fixed length chars")
76+
print("-"*30)
77+
print("\n".join(result['masked_fixed_length_chars']))
78+
print("\nObfuscated")
79+
print("-"*30)
80+
print("\n".join(result['obfuscated']))
81+
82+
Masked with entity labels
83+
------------------------------
84+
Name : <PATIENT>, Record date: <DATE>, MR # <MEDICALRECORD>.
85+
Dr. <DOCTOR>, ID: <DEVICE>, IP <IPADDR>.
86+
He is a <AGE>-year-old male was admitted to the <HOSPITAL> for cystectomy on <DATE>.
87+
Patient's VIN : <VIN>, SSN <SSN>, Driver's license no: <DLN>.
88+
Phone <PHONE>, <STREET>, <CITY>, E-MAIL: <EMAIL>.
89+
90+
Masked with chars
91+
------------------------------
92+
Name : [**************], Record date: [********], MR # [****].
93+
Dr. [********], ID: [********], IP [************].
94+
He is a **-year-old male was admitted to the [**********] for cystectomy on [******].
95+
Patient's VIN : [***************], SSN [**********], Driver's license no: [******].
96+
Phone [************], [***************], [***********], E-MAIL: [*************].
97+
98+
Masked with fixed length chars
99+
------------------------------
100+
Name : ****, Record date: ****, MR # ****.
101+
Dr. ****, ID: ****, IP ****.
102+
He is a ****-year-old male was admitted to the **** for cystectomy on ****.
103+
Patient's VIN : ****, SSN ****, Driver's license no: ****.
104+
Phone ****, ****, ****, E-MAIL: ****.
105+
106+
Obfuscated
107+
------------------------------
108+
Name : Marlana Salvage, Record date: 2093-02-23, MR # 824235.
109+
Dr. Vic Blackbird, ID: X2814358, IP 001.001.001.001.
110+
He is a 68-year-old male was admitted to the PRAIRIE SAINT JOHN'S for cystectomy on 02/23/93.
111+
Patient's VIN : 3IRWE31VQMG867619, SSN #509-32-6712, Driver's license no: W580998P.
112+
Phone (382) 505-3976, 521 Adams St, Port Shannon, E-MAIL: [email protected].
113+
```
114+
115+
{:.model-param}
116+
## Model Information
117+
118+
{:.table-model}
119+
|---|---|
120+
|Model Name:|clinical_deidentification_multi_mode_output|
121+
|Type:|pipeline|
122+
|Compatibility:|Healthcare NLP 5.3.1+|
123+
|License:|Licensed|
124+
|Edition:|Official|
125+
|Language:|en|
126+
|Size:|1.7 GB|
127+
128+
## Included Models
129+
130+
- DocumentAssembler
131+
- SentenceDetectorDLModel
132+
- TokenizerModel
133+
- WordEmbeddingsModel
134+
- MedicalNerModel
135+
- NerConverter
136+
- MedicalNerModel
137+
- NerConverter
138+
- ChunkMergeModel
139+
- ContextualParserModel
140+
- ContextualParserModel
141+
- ContextualParserModel
142+
- ContextualParserModel
143+
- ContextualParserModel
144+
- ContextualParserModel
145+
- TextMatcherModel
146+
- ContextualParserModel
147+
- RegexMatcherModel
148+
- ContextualParserModel
149+
- ContextualParserModel
150+
- ContextualParserModel
151+
- ContextualParserModel
152+
- ChunkMergeModel
153+
- ChunkMergeModel
154+
- DeIdentificationModel
155+
- DeIdentificationModel
156+
- DeIdentificationModel
157+
- DeIdentificationModel
158+
- Finisher

0 commit comments

Comments
 (0)