Skip to content

Commit 4c6eb3b

Browse files
authored
2024-10-07-sbiobertresolve_loinc_augmented_en (#1536)
1 parent 8ed9bd2 commit 4c6eb3b

4 files changed

+747
-0
lines changed
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
layout: model
3+
title: Sentence Entity Resolver for LOINC (sbiobert_base_cased_mli embeddings)
4+
author: John Snow Labs
5+
name: sbiobertresolve_loinc_augmented
6+
date: 2024-10-07
7+
tags: [licensed, en, entity_resolution, loinc, clinical]
8+
task: Entity Resolution
9+
language: en
10+
edition: Healthcare NLP 5.5.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: SentenceEntityResolverModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This model maps extracted clinical NER entities to Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. It trained on the augmented version of the dataset which is used in previous LOINC resolver models. It also provides the official resolution of the codes within the brackets.
22+
23+
## Predicted Entities
24+
25+
`loinc_code`
26+
27+
{:.btn-box}
28+
<button class="button button-orange" disabled>Live Demo</button>
29+
<button class="button button-orange" disabled>Open in Colab</button>
30+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_augmented_en_5.5.0_3.0_1728318394102.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
31+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_augmented_en_5.5.0_3.0_1728318394102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
40+
```python
41+
document_assembler = DocumentAssembler()\
42+
.setInputCol("text")\
43+
.setOutputCol("document")
44+
45+
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
46+
.setInputCols(["document"]) \
47+
.setOutputCol("sentence")
48+
49+
tokenizer = Tokenizer()\
50+
.setInputCols(["sentence"])\
51+
.setOutputCol("token")
52+
53+
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
54+
.setInputCols(["sentence", "token"])\
55+
.setOutputCol("embeddings")
56+
57+
ner_model = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models") \
58+
.setInputCols(["sentence", "token", "embeddings"]) \
59+
.setOutputCol("ner")
60+
61+
ner_converter = NerConverterInternal() \
62+
.setInputCols(["sentence", "token", "ner"]) \
63+
.setOutputCol("ner_chunk")\
64+
.setWhiteList(["Test"])
65+
66+
chunk2doc = Chunk2Doc()\
67+
.setInputCols("ner_chunk")\
68+
.setOutputCol("ner_chunk_doc")
69+
70+
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
71+
.setInputCols(["ner_chunk_doc"])\
72+
.setOutputCol("sbert_embeddings")\
73+
.setCaseSensitive(False)
74+
75+
resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_augmented","en", "clinical/models") \
76+
.setInputCols(["sbert_embeddings"]) \
77+
.setOutputCol("resolution")\
78+
.setDistanceFunction("EUCLIDEAN")
79+
80+
81+
nlpPipeline = Pipeline(stages=[document_assembler,
82+
sentence_detector,
83+
tokenizer,
84+
word_embeddings,
85+
ner_model,
86+
ner_converter,
87+
chunk2doc,
88+
sbert_embedder,
89+
resolver])
90+
91+
data = spark.createDataFrame([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text")
92+
93+
result = nlpPipeline.fit(data).transform(data)
94+
```
95+
```scala
96+
val document_assembler = new DocumentAssembler()
97+
.setInputCol("text")
98+
.setOutputCol("document")
99+
100+
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
101+
.setInputCols(Array("document"))
102+
.setOutputCol("sentence")
103+
104+
val tokenizer = new Tokenizer()
105+
.setInputCols(Array("sentence"))
106+
.setOutputCol("token")
107+
108+
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
109+
.setInputCols(Array("sentence","token"))
110+
.setOutputCol("embeddings")
111+
112+
val ner_model = MedicalNerModel.pretrained("ner_radiology","en","clinical/models")
113+
.setInputCols(Array("sentence","token","embeddings"))
114+
.setOutputCol("ner")
115+
116+
val ner_converter = new NerConverterInternal()
117+
.setInputCols(Array("sentence","token","ner"))
118+
.setOutputCol("ner_chunk")
119+
.setWhiteList(Array("Test"))
120+
121+
val chunk2doc = new Chunk2Doc()
122+
.setInputCols("ner_chunk")
123+
.setOutputCol("ner_chunk_doc")
124+
125+
val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")
126+
.setInputCols(Array("ner_chunk_doc"))
127+
.setOutputCol("sbert_embeddings")
128+
.setCaseSensitive(false)
129+
130+
val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_augmented","en","clinical/models")
131+
.setInputCols(Array("sbert_embeddings"))
132+
.setOutputCol("resolution")
133+
.setDistanceFunction("EUCLIDEAN")
134+
135+
val nlpPipeline = new Pipeline().setStages(Array(
136+
document_assembler,
137+
sentence_detector,
138+
tokenizer,
139+
word_embeddings,
140+
ner_model,
141+
ner_converter,
142+
chunk2doc,
143+
sbert_embedder,
144+
resolver))
145+
146+
val data = Seq([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text")
147+
148+
val result = nlpPipeline.fit(data).transform(data)
149+
```
150+
</div>
151+
152+
## Results
153+
154+
```bash
155+
+--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
156+
| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels|
157+
+--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
158+
| BMI| 90| 92| Test| 39156-5| BMI [Body mass index (BMI) [Ratio]]|BMI [Body mass index (BMI) [Ratio]]:::BM [IDH1 gene exon ...|39156-5:::100305-2:::LP266933-3:::100225-2:::LP241982-0::...|Observation:::Observation:::Observation:::Observation:::O...|
159+
|aspartate aminotransferase| 110|135| Test| LP15426-7|Aspartate aminotransferase [Aspartate aminotransferase]|Aspartate aminotransferase [Aspartate aminotransferase]::...|LP15426-7:::100739-2:::LP307348-5:::LP15333-5:::LP307326-...|Observation:::Observation:::Observation:::Observation:::O...|
160+
| alanine aminotransferase| 145|168| Test| LP15333-5| Alanine aminotransferase [Alanine aminotransferase]|Alanine aminotransferase [Alanine aminotransferase]:::Ala...|LP15333-5:::LP307326-1:::100738-4:::LP307348-5:::LP15426-...|Observation:::Observation:::Observation:::Observation:::O...|
161+
+--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
162+
```
163+
164+
{:.model-param}
165+
## Model Information
166+
167+
{:.table-model}
168+
|---|---|
169+
|Model Name:|sbiobertresolve_loinc_augmented|
170+
|Compatibility:|Healthcare NLP 5.5.0+|
171+
|License:|Licensed|
172+
|Edition:|Official|
173+
|Input Labels:|[sentence_embeddings]|
174+
|Output Labels:|[loinc_code]|
175+
|Language:|en|
176+
|Size:|1.1 GB|
177+
|Case sensitive:|false|
178+
179+
## References
180+
This model is trained with augmented version of the LOINC v2.78 dataset released in 2024-08-06.
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
layout: model
3+
title: Sentence Entity Resolver for Logical Observation Identifiers Names and Codes (LOINC) codes
4+
author: John Snow Labs
5+
name: sbiobertresolve_loinc
6+
date: 2024-10-07
7+
tags: [licensed, en, entity_resolution, loinc, clinical]
8+
task: Entity Resolution
9+
language: en
10+
edition: Healthcare NLP 5.5.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: SentenceEntityResolverModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This model maps extracted medical entities to Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings.
22+
It also provides the official resolution of the codes within the brackets.
23+
24+
## Predicted Entities
25+
26+
`loinc_code`
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
37+
38+
<div class="tabs-box" markdown="1">
39+
{% include programmingLanguageSelectScalaPythonNLU.html %}
40+
41+
```python
42+
43+
document_assembler = DocumentAssembler()\
44+
.setInputCol("text")\
45+
.setOutputCol("document")
46+
47+
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
48+
.setInputCols(["document"]) \
49+
.setOutputCol("sentence")
50+
51+
tokenizer = Tokenizer()\
52+
.setInputCols(["sentence"])\
53+
.setOutputCol("token")
54+
55+
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
56+
.setInputCols(["sentence", "token"])\
57+
.setOutputCol("embeddings")
58+
59+
ner_model = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models") \
60+
.setInputCols(["sentence", "token", "embeddings"]) \
61+
.setOutputCol("ner")
62+
63+
ner_converter = NerConverterInternal() \
64+
.setInputCols(["sentence", "token", "ner"]) \
65+
.setOutputCol("ner_chunk")\
66+
.setWhiteList(["Test"])
67+
68+
chunk2doc = Chunk2Doc()\
69+
.setInputCols("ner_chunk")\
70+
.setOutputCol("ner_chunk_doc")
71+
72+
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
73+
.setInputCols(["ner_chunk_doc"])\
74+
.setOutputCol("sbert_embeddings")\
75+
.setCaseSensitive(False)
76+
77+
resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en", "clinical/models") \
78+
.setInputCols(["sbert_embeddings"]) \
79+
.setOutputCol("resolution")\
80+
.setDistanceFunction("EUCLIDEAN")
81+
82+
83+
nlpPipeline = Pipeline(stages=[document_assembler,
84+
sentence_detector,
85+
tokenizer,
86+
word_embeddings,
87+
ner_model,
88+
ner_converter,
89+
chunk2doc,
90+
sbert_embedder,
91+
resolver])
92+
93+
data = spark.createDataFrame([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
94+
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
95+
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
96+
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination
97+
is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text")
98+
99+
result = nlpPipeline.fit(data).transform(data)
100+
101+
```
102+
```scala
103+
104+
val document_assembler = new DocumentAssembler()
105+
.setInputCol("text")
106+
.setOutputCol("document")
107+
108+
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
109+
.setInputCols(Array("document"))
110+
.setOutputCol("sentence")
111+
112+
val tokenizer = new Tokenizer()
113+
.setInputCols(Array("sentence"))
114+
.setOutputCol("token")
115+
116+
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
117+
.setInputCols(Array("sentence","token"))
118+
.setOutputCol("embeddings")
119+
120+
val ner_model = MedicalNerModel.pretrained("ner_jsl","en","clinical/models")
121+
.setInputCols(Array("sentence","token","embeddings"))
122+
.setOutputCol("ner")
123+
124+
val ner_converter = new NerConverterInternal()
125+
.setInputCols(Array("sentence","token","ner"))
126+
.setOutputCol("ner_chunk")
127+
.setWhiteList(Array("Test"))
128+
129+
val chunk2doc = new Chunk2Doc()
130+
.setInputCols("ner_chunk")
131+
.setOutputCol("ner_chunk_doc")
132+
133+
val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")
134+
.setInputCols(Array("ner_chunk_doc"))
135+
.setOutputCol("sbert_embeddings")
136+
.setCaseSensitive(false)
137+
138+
val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en","clinical/models")
139+
.setInputCols(Array("sbert_embeddings"))
140+
.setOutputCol("resolution")
141+
.setDistanceFunction("EUCLIDEAN")
142+
143+
val nlpPipeline = new Pipeline().setStages(Array(
144+
document_assembler,
145+
sentence_detector,
146+
tokenizer,
147+
word_embeddings,
148+
ner_model,
149+
ner_converter,
150+
chunk2doc,
151+
sbert_embedder,
152+
resolver))
153+
154+
val data = Seq([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
155+
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
156+
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
157+
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination
158+
is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text")
159+
160+
val result = nlpPipeline.fit(data).transform(data)
161+
162+
```
163+
</div>
164+
165+
## Results
166+
167+
```bash
168+
169+
+-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
170+
| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels|
171+
+-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
172+
| physical examination| 490|509| Test| 29544-4| Physical findings [Physical findings]|Physical findings [Physical findings]:::Physical exam by ...|29544-4:::55286-9:::11435-5:::11384-5:::29545-1:::8709-8:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...|
173+
| Laboratory studies| 528|545| Test| 26436-6| Laboratory studies (set) [Laboratory studies (set)]|Laboratory studies (set) [Laboratory studies (set)]:::Lab...|26436-6:::52482-7:::11502-2:::34075-2:::100455-5:::85069-...|ACTIVE:::DISCOURAGED:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:...|
174+
| Hemoglobin| 567|576| Test| 10346-5|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|10346-5:::15082-1:::11559-2:::2030-5:::34618-9:::38896-7:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...|
175+
| Hematocrit| 590|599| Test| 32354-3|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|32354-3:::20570-8:::11153-4:::13508-7:::104874-3:::42908-...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...|
176+
|Mean Corpuscular Volume| 607|629| Test| 30386-7|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|30386-7:::101864-7:::20161-6:::18033-1:::19853-1:::101150...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...|
177+
+-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
178+
179+
```
180+
181+
{:.model-param}
182+
## Model Information
183+
184+
{:.table-model}
185+
|---|---|
186+
|Model Name:|sbiobertresolve_loinc|
187+
|Compatibility:|Healthcare NLP 5.5.0+|
188+
|License:|Licensed|
189+
|Edition:|Official|
190+
|Input Labels:|[sentence_embeddings]|
191+
|Output Labels:|[loinc_code]|
192+
|Language:|en|
193+
|Size:|666.8 MB|
194+
|Case sensitive:|false|
195+
196+
## References
197+
This model is trained with LOINC v2.78 dataset released in 2024-08-06.

0 commit comments

Comments
 (0)