Skip to content

2024-03-03-snomed_findings_resolver_pipeline_en #1004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions docs/_posts/akrztrk/2024-03-03-snomed_findings_resolver_pipeline_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
layout: model
title: PiPipeline for Snomed Concept, Findings Version
author: John Snow Labs
name: snomed_findings_resolver_pipeline
date: 2024-03-03
tags: [licensed, en, snomed, pipeline, resolver]
task: [Entity Resolution, Pipeline Healthcare]
language: en
edition: Healthcare NLP 5.3.0
spark_version: 3.4
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This pipeline extracts clinical findings and maps them to their corresponding SNOMED (CT version) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/snomed_findings_resolver_pipeline_en_5.3.0_3.4_1709489839090.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/snomed_findings_resolver_pipeline_en_5.3.0_3.4_1709489839090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

from sparknlp.pretrained import PretrainedPipeline

snomed_pipeline = PretrainedPipeline("snomed_findings_resolver_pipeline", "en", "clinical/models")

result = snomed_pipeline.annotate("""The patient exhibited recurrent upper respiratory tract infections, fever, unintentional weight loss, and occasional night sweats. Clinically, they appeared cachectic and pale, with notable hepatosplenomegaly. Laboratory results confirmed pancytopenia.""")

```
```scala

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val snomed_pipeline = PretrainedPipeline("snomed_findings_resolver_pipeline", "en", "clinical/models")

val result = snomed_pipeline.annotate("""The patient exhibited recurrent upper respiratory tract infections, fever, unintentional weight loss, and occasional night sweats. Clinically, they appeared cachectic and pale, with notable hepatosplenomegaly. Laboratory results confirmed pancytopenia.""")

```
</div>

## Results

```bash

+----------------------------------+-----+---+---------+-----------+-------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
| chunk|begin|end|ner_label|snomed_code| resolution| all_k_resolutions| all_k_codes|
+----------------------------------+-----+---+---------+-----------+-------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+
|upper respiratory tract infections| 32| 65| PROBLEM| 195708003|recurrent upper respiratory tract infection|recurrent upper respiratory tract infection:::upper respi...|195708003:::54150009:::312118003:::448739000:::4519910001...|
| fever| 68| 72| PROBLEM| 386661006| fever|fever:::intermittent fever:::sustained fever:::prolonged ...|386661006:::77957000:::271751000:::248435007:::12579009::...|
| unintentional weight loss| 75| 99| PROBLEM| 448765001| unintentional weight loss|unintentional weight loss:::unexplained weight loss:::int...|448765001:::422868009:::416528001:::267024001:::89362005:...|
| night sweats| 117|128| PROBLEM| 42984000| night sweats|night sweats:::frequent night waking:::night waking:::nig...|42984000:::423052008:::67233009:::102549009:::36163009:::...|
| cachectic| 157|165| PROBLEM| 238108007| cachectic|cachectic:::cachexia associated with aids:::cardiac cache...|238108007:::422003001:::284529003:::788876001:::240128005...|
| pale| 171|174| PROBLEM| 398979000| pale complexion|pale complexion:::pale liver:::pale tongue:::pale lung:::...|398979000:::95199009:::719637000:::95200007:::70396004:::...|
| hepatosplenomegaly| 190|207| PROBLEM| 36760000| hepatosplenomegaly|hepatosplenomegaly:::congestive splenomegaly:::neonatal h...|36760000:::19058002:::80378000:::16294009:::191382009:::8...|
| pancytopenia| 239|250| PROBLEM| 127034005| pancytopenia|pancytopenia:::drug induced pancytopenia:::pancytopenia -...|127034005:::736024007:::5876000:::124961001:::417672002::...|
+----------------------------------+-----+---+---------+-----------+-------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+

```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|snomed_findings_resolver_pipeline|
|Type:|pipeline|
|Compatibility:|Healthcare NLP 5.3.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|2.8 GB|

## Included Models

- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- Chunk2Doc
- BertSentenceEmbeddings
- SentenceEntityResolverModel
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
---
layout: model
title: Sentence Entity Resolver for SNOMED (sbiobertresolve_snomed_bodyStructure)
author: John Snow Labs
name: sbiobertresolve_snomed_bodyStructure
date: 2024-03-04
tags: [licensed, en, resolver, snomed, bodystructure]
task: Entity Resolution
language: en
edition: Healthcare NLP 5.3.0
spark_version: 3.0
supported: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This model maps extracted medical (anatomical structure) entities to SNOMED codes (body structure version) using `sbiobert_base_cased_mli` BERT sentence embeddings

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_snomed_bodyStructure_en_5.3.0_3.0_1709543980434.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_snomed_bodyStructure_en_5.3.0_3.0_1709543980434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")

ner_jsl = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner_jsl")

ner_jsl_converter = NerConverterInternal() \
.setInputCols(["sentence", "token", "ner_jsl"]) \
.setOutputCol("ner_jsl_chunk")\
.setWhiteList(["External_body_part_or_region",
"Internal_organ_or_component"])\
.setReplaceLabels({"External_body_part_or_region": "BodyPart",
"Internal_organ_or_component": "BodyPart" })

ner_anatomy = MedicalNerModel.pretrained("ner_anatomy_coarse", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner_anatomy")

ner_anatomy_converter = NerConverterInternal() \
.setInputCols(["sentence", "token", "ner_anatomy"]) \
.setOutputCol("ner_anatomy_chunk")\
.setReplaceLabels({"Anatomy": "BodyPart"})

ner_oncology_anatomy = MedicalNerModel.pretrained("ner_oncology_anatomy_general", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner_oncology_anatomy")

ner_oncology_anatomy_converter = NerConverterInternal() \
.setInputCols(["sentence", "token", "ner_oncology_anatomy"]) \
.setOutputCol("ner_oncology_anatomy_chunk")\
.setReplaceLabels({"Anatomical_Site": "BodyPart"})

chunk_merger = ChunkMergeApproach() \
.setInputCols("ner_jsl_chunk", "ner_anatomy_chunk", "ner_oncology_anatomy_chunk") \
.setOutputCol("ner_chunk") \

chunk2doc = Chunk2Doc()\
.setInputCols("ner_chunk")\
.setOutputCol("ner_chunk_doc")

sbert_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")\
.setCaseSensitive(False)

snomed_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_snomed_bodyStructure", "en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("snomed_code")\

snomed_pipeline = Pipeline(stages = [
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner_jsl,
ner_jsl_converter,
ner_anatomy,
ner_anatomy_converter,
ner_oncology_anatomy,
ner_oncology_anatomy_converter,
chunk_merger,
chunk2doc,
sbert_embeddings,
snomed_resolver
])


data = spark.createDataFrame([["""The patient is a 30-year-old female with a long history of insulin-dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI."""]]).toDF("text")

model = snomed_pipeline.fit(data)
result = model.transform(data)
```
```scala
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")

val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence","token"))
.setOutputCol("embeddings")

val ner_jsl = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models")
.setInputCols(Array("sentence","token","embeddings"))
.setOutputCol("ner")

val ner_jsl_converter = new NerConverter()
.setInputCols(Array("sentence","token","ner"))
.setOutputCol("ner_jsl_chunk")
.setWhiteList(Array("External_body_part_or_region", "Internal_organ_or_component"))
.setReplaceLabels({"Anatomical_Site": "BodyPart"})

val ner_anatomy = MedicalNerModel.pretrained("ner_anatomy_coarse", "en", "clinical/models")
.setInputCols(Array("sentence","token","embeddings"))
.setOutputCol("ner_anatomy")

val ner_anatomy_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner_anatomy"))
.setOutputCol("ner_anatomy_chunk")
.setReplaceLabels(Map{"Anatomy" -> "BodyPart"})

val ner_oncology_anatomy = MedicalNerModel.pretrained("ner_oncology_anatomy_general", "en", "clinical/models")
.setInputCols(Array("sentence","token","embeddings"))
.setOutputCol("ner_oncology_anatomy")

val ner_oncology_anatomy_converter = new NerConverter()
.setInputCols(Array("sentence","token","ner_oncology_anatomy"))
.setOutputCol("ner_oncology_anatomy_chunk")
.setWhiteList(Array("Anatomical_Site"))
.setReplaceLabels(Map{"Anatomical_Site" -> "BodyPart"})

val chunk_merger = ChunkMergeApproach() \
.setInputCols("ner_jsl_chunk", "ner_anatomy_chunk", "ner_oncology_anatomy_chunk")
.setOutputCol("ner_chunk")

val chunk2doc = new Chunk2Doc()
.setInputCols("ner_chunk")
.setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")
.setCaseSensitive(False)

val resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_snomed_bodyStructure", "en", "clinical/models")
.setInputCols(Array("ner_chunk", "sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")

val nlpPipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
ner_jsl,
ner_jsl_converter,
ner_anatomy,
ner_anatomy_converter,
ner_oncology_anatomy,
ner_oncology_anatomy_converter,
chunk_merger,
chunk2doc,
sbert_embeddings,
snomed_resolver
))

val data = Seq("Medical professionals rushed in the bustling emergency room to attend to the patient with alarming symptoms.The attending physician immediately noted signs of respiratory distress, including stridor, a high-pitched sound indicative of upper respiratory tract obstruction.The patient, struggling to breathe, exhibited dyspnea, their chest heaving with each labored breath. Concern heightened when they began experiencing syncope, a sudden loss of consciousness likely stemming from inadequate oxygenation. Further examination revealed a respiratory tract hemorrhage.") .toDF("text")

val model = snomed_pipeline.fit(data)

val result = model.transform(data)
```
</div>

## Results

```bash
+-------------------+----------------------------+-----------+--------------------------+--------------------------------------------------+--------------------------------------------------+
| chunk| label|snomed_code| resolution| all_codes| all_resolutions|
+-------------------+----------------------------+-----------+--------------------------+--------------------------------------------------+--------------------------------------------------+
| coronary artery| Anatomy| 181294004| coronary artery|181294004:::119204004:::360487004:::55537005:::...|coronary artery:::coronary artery part:::segmen...|
| renal| Anatomy| 64033007| renal structure|64033007:::243968009:::84924000:::303402001:::3...|renal structure:::renal area:::renal segment:::...|
|peripheral vascular| Anatomy| 51833009|peripheral vascular system|51833009:::840581000:::3058005:::300054001:::28...|peripheral vascular system:::peripheral artery:...|
| lower extremities|External_body_part_or_region| 61685007| lower extremity|61685007:::127951001:::120575009:::182281004:::...|lower extremity:::lower extremity region:::lowe...|
+-------------------+----------------------------+-----------+--------------------------+--------------------------------------------------+--------------------------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|sbiobertresolve_snomed_bodyStructure|
|Compatibility:|Healthcare NLP 5.3.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence_embeddings]|
|Output Labels:|[snomed_code]|
|Language:|en|
|Size:|197.9 MB|
|Case sensitive:|false|

## References

This model is trained with the augmented version of NIH September 2023 SNOMED CT United States (US) Edition.
Loading