|
| 1 | +--- |
| 2 | +layout: model |
| 3 | +title: Sentence Entity Resolver for Logical Observation Identifiers Names and Codes (LOINC) codes |
| 4 | +author: John Snow Labs |
| 5 | +name: sbiobertresolve_loinc |
| 6 | +date: 2024-10-07 |
| 7 | +tags: [licensed, en, entity_resolution, loinc, clinical] |
| 8 | +task: Entity Resolution |
| 9 | +language: en |
| 10 | +edition: Healthcare NLP 5.5.0 |
| 11 | +spark_version: 3.0 |
| 12 | +supported: true |
| 13 | +annotator: SentenceEntityResolverModel |
| 14 | +article_header: |
| 15 | + type: cover |
| 16 | +use_language_switcher: "Python-Scala-Java" |
| 17 | +--- |
| 18 | + |
| 19 | +## Description |
| 20 | + |
| 21 | +This model maps extracted medical entities to Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. |
| 22 | +It also provides the official resolution of the codes within the brackets. |
| 23 | + |
| 24 | +## Predicted Entities |
| 25 | + |
| 26 | +`loinc_code` |
| 27 | + |
| 28 | +{:.btn-box} |
| 29 | +<button class="button button-orange" disabled>Live Demo</button> |
| 30 | +<button class="button button-orange" disabled>Open in Colab</button> |
| 31 | +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} |
| 32 | +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} |
| 33 | + |
| 34 | +## How to use |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +<div class="tabs-box" markdown="1"> |
| 39 | +{% include programmingLanguageSelectScalaPythonNLU.html %} |
| 40 | + |
| 41 | +```python |
| 42 | + |
| 43 | +document_assembler = DocumentAssembler()\ |
| 44 | + .setInputCol("text")\ |
| 45 | + .setOutputCol("document") |
| 46 | + |
| 47 | +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \ |
| 48 | + .setInputCols(["document"]) \ |
| 49 | + .setOutputCol("sentence") |
| 50 | + |
| 51 | +tokenizer = Tokenizer()\ |
| 52 | + .setInputCols(["sentence"])\ |
| 53 | + .setOutputCol("token") |
| 54 | + |
| 55 | +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ |
| 56 | + .setInputCols(["sentence", "token"])\ |
| 57 | + .setOutputCol("embeddings") |
| 58 | + |
| 59 | +ner_model = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models") \ |
| 60 | + .setInputCols(["sentence", "token", "embeddings"]) \ |
| 61 | + .setOutputCol("ner") |
| 62 | + |
| 63 | +ner_converter = NerConverterInternal() \ |
| 64 | + .setInputCols(["sentence", "token", "ner"]) \ |
| 65 | + .setOutputCol("ner_chunk")\ |
| 66 | + .setWhiteList(["Test"]) |
| 67 | + |
| 68 | +chunk2doc = Chunk2Doc()\ |
| 69 | + .setInputCols("ner_chunk")\ |
| 70 | + .setOutputCol("ner_chunk_doc") |
| 71 | + |
| 72 | +sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ |
| 73 | + .setInputCols(["ner_chunk_doc"])\ |
| 74 | + .setOutputCol("sbert_embeddings")\ |
| 75 | + .setCaseSensitive(False) |
| 76 | + |
| 77 | +resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en", "clinical/models") \ |
| 78 | + .setInputCols(["sbert_embeddings"]) \ |
| 79 | + .setOutputCol("resolution")\ |
| 80 | + .setDistanceFunction("EUCLIDEAN") |
| 81 | + |
| 82 | + |
| 83 | +nlpPipeline = Pipeline(stages=[document_assembler, |
| 84 | + sentence_detector, |
| 85 | + tokenizer, |
| 86 | + word_embeddings, |
| 87 | + ner_model, |
| 88 | + ner_converter, |
| 89 | + chunk2doc, |
| 90 | + sbert_embedder, |
| 91 | + resolver]) |
| 92 | + |
| 93 | +data = spark.createDataFrame([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. |
| 94 | + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension |
| 95 | + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that |
| 96 | + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination |
| 97 | + is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text") |
| 98 | + |
| 99 | +result = nlpPipeline.fit(data).transform(data) |
| 100 | + |
| 101 | +``` |
| 102 | +```scala |
| 103 | + |
| 104 | +val document_assembler = new DocumentAssembler() |
| 105 | + .setInputCol("text") |
| 106 | + .setOutputCol("document") |
| 107 | + |
| 108 | +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") |
| 109 | + .setInputCols(Array("document")) |
| 110 | + .setOutputCol("sentence") |
| 111 | + |
| 112 | +val tokenizer = new Tokenizer() |
| 113 | + .setInputCols(Array("sentence")) |
| 114 | + .setOutputCol("token") |
| 115 | + |
| 116 | +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models") |
| 117 | + .setInputCols(Array("sentence","token")) |
| 118 | + .setOutputCol("embeddings") |
| 119 | + |
| 120 | +val ner_model = MedicalNerModel.pretrained("ner_jsl","en","clinical/models") |
| 121 | + .setInputCols(Array("sentence","token","embeddings")) |
| 122 | + .setOutputCol("ner") |
| 123 | + |
| 124 | +val ner_converter = new NerConverterInternal() |
| 125 | + .setInputCols(Array("sentence","token","ner")) |
| 126 | + .setOutputCol("ner_chunk") |
| 127 | + .setWhiteList(Array("Test")) |
| 128 | + |
| 129 | +val chunk2doc = new Chunk2Doc() |
| 130 | + .setInputCols("ner_chunk") |
| 131 | + .setOutputCol("ner_chunk_doc") |
| 132 | + |
| 133 | +val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models") |
| 134 | + .setInputCols(Array("ner_chunk_doc")) |
| 135 | + .setOutputCol("sbert_embeddings") |
| 136 | + .setCaseSensitive(false) |
| 137 | + |
| 138 | +val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en","clinical/models") |
| 139 | + .setInputCols(Array("sbert_embeddings")) |
| 140 | + .setOutputCol("resolution") |
| 141 | + .setDistanceFunction("EUCLIDEAN") |
| 142 | + |
| 143 | +val nlpPipeline = new Pipeline().setStages(Array( |
| 144 | + document_assembler, |
| 145 | + sentence_detector, |
| 146 | + tokenizer, |
| 147 | + word_embeddings, |
| 148 | + ner_model, |
| 149 | + ner_converter, |
| 150 | + chunk2doc, |
| 151 | + sbert_embedder, |
| 152 | + resolver)) |
| 153 | + |
| 154 | +val data = Seq([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. |
| 155 | + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension |
| 156 | + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that |
| 157 | + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination |
| 158 | + is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text") |
| 159 | + |
| 160 | +val result = nlpPipeline.fit(data).transform(data) |
| 161 | + |
| 162 | +``` |
| 163 | +</div> |
| 164 | + |
| 165 | +## Results |
| 166 | + |
| 167 | +```bash |
| 168 | + |
| 169 | ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ |
| 170 | +| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels| |
| 171 | ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ |
| 172 | +| physical examination| 490|509| Test| 29544-4| Physical findings [Physical findings]|Physical findings [Physical findings]:::Physical exam by ...|29544-4:::55286-9:::11435-5:::11384-5:::29545-1:::8709-8:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| |
| 173 | +| Laboratory studies| 528|545| Test| 26436-6| Laboratory studies (set) [Laboratory studies (set)]|Laboratory studies (set) [Laboratory studies (set)]:::Lab...|26436-6:::52482-7:::11502-2:::34075-2:::100455-5:::85069-...|ACTIVE:::DISCOURAGED:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:...| |
| 174 | +| Hemoglobin| 567|576| Test| 10346-5|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|10346-5:::15082-1:::11559-2:::2030-5:::34618-9:::38896-7:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| |
| 175 | +| Hematocrit| 590|599| Test| 32354-3|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|32354-3:::20570-8:::11153-4:::13508-7:::104874-3:::42908-...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| |
| 176 | +|Mean Corpuscular Volume| 607|629| Test| 30386-7|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|30386-7:::101864-7:::20161-6:::18033-1:::19853-1:::101150...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| |
| 177 | ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ |
| 178 | + |
| 179 | +``` |
| 180 | +
|
| 181 | +{:.model-param} |
| 182 | +## Model Information |
| 183 | +
|
| 184 | +{:.table-model} |
| 185 | +|---|---| |
| 186 | +|Model Name:|sbiobertresolve_loinc| |
| 187 | +|Compatibility:|Healthcare NLP 5.5.0+| |
| 188 | +|License:|Licensed| |
| 189 | +|Edition:|Official| |
| 190 | +|Input Labels:|[sentence_embeddings]| |
| 191 | +|Output Labels:|[loinc_code]| |
| 192 | +|Language:|en| |
| 193 | +|Size:|666.8 MB| |
| 194 | +|Case sensitive:|false| |
| 195 | +
|
| 196 | +## References |
| 197 | +This model is trained with LOINC v2.78 dataset released in 2024-08-06. |
0 commit comments