Skip to content

Commit 5399ddd

Browse files
authored
2024-03-28-ner_profiling_deidentification_en (#1099)
1 parent c49fe58 commit 5399ddd

File tree

1 file changed

+160
-0
lines changed

1 file changed

+160
-0
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
layout: model
3+
title: Named Entity Recognition Profiling (De-Identification)
4+
author: John Snow Labs
5+
name: ner_profiling_deidentification
6+
date: 2024-03-28
7+
tags: [licensed, en, clinical, profiling, ner_profiling, ner, deid, de_identification]
8+
task: [Named Entity Recognition, Pipeline Healthcare]
9+
language: en
10+
edition: Healthcare NLP 5.3.1
11+
spark_version: 3.2
12+
supported: true
13+
annotator: PipelineModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This pipeline is designed for deidentification in clinical texts, leveraging a range of pretrained NER models tailored for extracting and anonymizing sensitive information. By integrating these models, the pipeline provides a comprehensive solution for protecting patient privacy and complying with data protection regulations.
22+
23+
The pipeline employs `embeddings_clinical` for contextual understanding and includes the following specialized NER models for deidentification:
24+
25+
`ner_deid_augmented`, `ner_deid_enriched`, `ner_deid_generic_augmented`, `ner_deid_name_multilingual_clinical`, `ner_deid_sd`, `ner_deid_subentity_augmented`, `ner_deid_subentity_augmented_i2b2`, `ner_deid_synthetic`, `ner_jsl`, `ner_jsl_enriched`
26+
27+
Each model addresses a unique aspect of deidentification, making this pipeline an all-encompassing tool for securing clinical narratives.
28+
29+
{:.btn-box}
30+
<button class="button button-orange" disabled>Live Demo</button>
31+
<button class="button button-orange" disabled>Open in Colab</button>
32+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_profiling_deidentification_en_5.3.1_3.2_1711630659423.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
33+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_profiling_deidentification_en_5.3.1_3.2_1711630659423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
34+
35+
## How to use
36+
37+
38+
39+
<div class="tabs-box" markdown="1">
40+
{% include programmingLanguageSelectScalaPythonNLU.html %}
41+
```python
42+
43+
from sparknlp.pretrained import PretrainedPipeline
44+
45+
ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", 'en', 'clinical/models')
46+
47+
result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
48+
49+
```
50+
```scala
51+
52+
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
53+
54+
val ner_profiling_pipeline = PretrainedPipeline("ner_profiling_deidentification", "en", "clinical/models")
55+
56+
val result = ner_profiling_pipeline.annotate("""Record date : 2093-01-13 , David Hale , M.D . , Name : Hendrickson Ora , MR # 7194334 Date : 01/13/93 . PCP : Oliveira , 25 years-old , Record date : 2079-11-09 . Cocke County Baptist Hospital , 0295 Keats Street , Phone 55-555-5555 .""")
57+
58+
```
59+
</div>
60+
61+
## Results
62+
63+
```bash
64+
65+
******************** ner_deid_name_multilingual_clinical Model Results ********************
66+
67+
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
68+
69+
******************** ner_deid_subentity_augmented_i2b2 Model Results ********************
70+
71+
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'PATIENT') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
72+
73+
******************** ner_deid_large Model Results ********************
74+
75+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
76+
77+
******************** ner_jsl_enriched Model Results ********************
78+
79+
('01/13/93', 'Date') ('25 years-old', 'Age') ('2079-11-09', 'Date')
80+
81+
******************** ner_deid_sd_large Model Results ********************
82+
83+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
84+
85+
******************** ner_deid_generic_augmented Model Results ********************
86+
87+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
88+
89+
******************** ner_deid_name_multilingual_clinical_langtest Model Results ********************
90+
91+
('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('Oliveira', 'NAME')
92+
93+
******************** ner_deid_generic_augmented_langtest Model Results ********************
94+
95+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
96+
97+
******************** ner_deid_sd Model Results ********************
98+
99+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION')
100+
101+
******************** ner_deid_subentity_augmented Model Results ********************
102+
103+
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'PATIENT') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
104+
105+
******************** ner_deid_large_langtest Model Results ********************
106+
107+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
108+
109+
******************** ner_jsl Model Results ********************
110+
111+
('01/13/93', 'DATE') ('25 years-old', 'AGE')
112+
113+
******************** ner_deid_synthetic Model Results ********************
114+
115+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
116+
117+
******************** ner_deid_augmented Model Results ********************
118+
119+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('Keats Street', 'LOCATION')
120+
121+
******************** ner_deid_generic_augmented_allUpperCased_langtest Model Results ********************
122+
123+
('2093-01-13', 'DATE') ('David Hale', 'NAME') ('Hendrickson Ora', 'NAME') ('7194334', 'ID') ('01/13/93', 'DATE') ('Oliveira', 'NAME') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'LOCATION') ('0295 Keats Street', 'LOCATION') ('55-555-5555', 'CONTACT')
124+
125+
******************** ner_deid_enriched_langtest Model Results ********************
126+
127+
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
128+
129+
******************** ner_deid_subentity_augmented_langtest Model Results ********************
130+
131+
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
132+
133+
******************** ner_deid_enriched Model Results ********************
134+
135+
('2093-01-13', 'DATE') ('David Hale', 'DOCTOR') ('Hendrickson Ora', 'DOCTOR') ('7194334', 'MEDICALRECORD') ('01/13/93', 'DATE') ('Oliveira', 'DOCTOR') ('25', 'AGE') ('2079-11-09', 'DATE') ('Cocke County Baptist Hospital', 'HOSPITAL') ('0295 Keats Street', 'STREET') ('55-555-5555', 'PHONE')
136+
137+
138+
```
139+
140+
{:.model-param}
141+
## Model Information
142+
143+
{:.table-model}
144+
|---|---|
145+
|Model Name:|ner_profiling_deidentification|
146+
|Type:|pipeline|
147+
|Compatibility:|Healthcare NLP 5.3.1+|
148+
|License:|Licensed|
149+
|Edition:|Official|
150+
|Language:|en|
151+
|Size:|2.0 GB|
152+
153+
## Included Models
154+
155+
- DocumentAssembler
156+
- SentenceDetectorDLModel
157+
- TokenizerModel
158+
- WordEmbeddingsModel
159+
- MedicalNerModel x 14
160+
- NerConverterInternalModel x 14

0 commit comments

Comments
 (0)