Finance 1.20.0 (#755)

dcecchini · jsl-models · gadde5300 · web-flow · commit 06f400d75b84 · 2023-11-12T06:01:49.000-03:00
* Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 <gadde5300@gmail.com> * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 <vildansarikaya25@gmail.com> * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer <samedkocer22@gmail.com> * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer <samedkocer22@gmail.com> * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer <mgturer@gmail.com> * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer <samedkocer22@gmail.com> * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 <vildansarikaya25@gmail.com> * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer <samedkocer22@gmail.com> * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer <mgturer@gmail.com> * 2023-10-06-finembedding_e5_base_en (#685) * Add model 2023-10-06-finembedding_e5_base_en * Add model 2023-10-06-finner_absa_sm_en * Add model 2023-10-06-finassertion_absa_sm_en --------- Co-authored-by: dcecchini <dadachini@hotmail.com> * Add model 2023-11-09-finembedding_e5_large_en (#745) Co-authored-by: dcecchini <dadachini@hotmail.com> * 2023-11-11-finner_aspect_based_sentiment_md_en (#754) * Add model 2023-11-11-finner_aspect_based_sentiment_md_en * Add model 2023-11-11-finassertion_aspect_based_sentiment_md_en * Update 2023-11-11-finner_aspect_based_sentiment_md_en.md * Update 2023-11-11-finassertion_aspect_based_sentiment_md_en.md --------- Co-authored-by: Mary-Sci <meryemyildiz366@gmail.com> Co-authored-by: Merve Ertas Uslu <67653613+Mary-Sci@users.noreply.github.com> --------- Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com> Co-authored-by: gadde5300 <gadde5300@gmail.com> Co-authored-by: Meryem1425 <vildansarikaya25@gmail.com> Co-authored-by: SKocer <samedkocer22@gmail.com> Co-authored-by: Merve Ertas Uslu <67653613+Mary-Sci@users.noreply.github.com> Co-authored-by: gokhanturer <mgturer@gmail.com> Co-authored-by: Mary-Sci <meryemyildiz366@gmail.com>
diff --git a/docs/_posts/Mary-Sci/2023-11-11-finassertion_aspect_based_sentiment_md_en.md b/docs/_posts/Mary-Sci/2023-11-11-finassertion_aspect_based_sentiment_md_en.md
@@ -0,0 +1,131 @@
+---
+layout: model
+title: Financial Assertion of Aspect-Based Sentiment (md, Medium)
+author: John Snow Labs
+name: finassertion_aspect_based_sentiment_md
+date: 2023-11-11
+tags: [assertion, licensed, en, finance]
+task: Assertion Status
+language: en
+edition: Finance NLP 1.0.0
+spark_version: 3.0
+supported: true
+annotator: AssertionDLModel
+article_header:
+type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This assertion model classifies financial entities into an aspect-based sentiment. It is designed to be used together with the associated NER model.
+
+## Predicted Entities
+
+`POSITIVE`, `NEGATIVE`, `NEUTRAL`
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_en_1.0.0_3.0_1699705705778.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_en_1.0.0_3.0_1699705705778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = nlp.DocumentAssembler()\
+    .setInputCol("text")\
+    .setOutputCol("document")
+
+# Sentence Detector annotator, processes various sentences per line
+sentenceDetector = nlp.SentenceDetector()\
+    .setInputCols(["document"])\
+    .setOutputCol("sentence")
+
+# Tokenizer splits words in a relevant format for NLP
+tokenizer = nlp.Tokenizer()\
+    .setInputCols(["sentence"])\
+    .setOutputCol("token")
+
+bert_embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")\
+    .setInputCols("sentence", "token")\
+    .setOutputCol("embeddings")\
+    .setMaxSentenceLength(512)
+
+finance_ner = finance.NerModel.pretrained("finner_aspect_based_sentiment_md", "en", "finance/models")\
+    .setInputCols(["sentence", "token", "embeddings"])\
+    .setOutputCol("ner")
+
+ner_converter = finance.NerConverterInternal()\
+    .setInputCols(["sentence", "token", "ner"])\
+    .setOutputCol("ner_chunk")
+
+assertion_model = finance.AssertionDLModel.pretrained("finassertion_aspect_based_sentiment_md", "en", "finance/models")\
+    .setInputCols(["sentence", "ner_chunk", "embeddings"])\
+    .setOutputCol("assertion")
+
+
+nlpPipeline = nlp.Pipeline(
+    stages=[documentAssembler,
+            sentenceDetector,
+            tokenizer,
+            bert_embeddings,
+            finance_ner,
+            ner_converter,
+            assertion_model])
+
+text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."
+
+spark_df = spark.createDataFrame([[text]]).toDF("text")
+
+result = nlpPipeline.fit(spark_df ).transform(spark_df)
+
+result.select(F.explode(F.arrays_zip("ner_chunk.result", "ner_chunk.metadata", "assertion.result", "assertion.metadata")).alias("cols"))\
+      .select(F.expr("cols['0']").alias("entity"),
+              F.expr("cols['1']['entity']").alias("label"),
+              F.expr("cols['2']").alias("assertion"),
+              F.expr("cols['3']['confidence']").alias("confidence")).show(50, truncate=False)
+```
+
+</div>
+
+## Results
+
+```bash
++--------+---------+---------+----------+
+|entity  |label    |assertion|confidence|
++--------+---------+---------+----------+
+|Equity  |LIABILITY|POSITIVE |0.9895    |
+|earnings|PROFIT   |POSITIVE |0.995     |
++--------+---------+---------+----------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|finassertion_aspect_based_sentiment_md|
+|Compatibility:|Finance NLP 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[document, chunk, embeddings]|
+|Output Labels:|[assertion]|
+|Language:|en|
+|Size:|2.7 MB|
+
+## Benchmarking
+
+```bash
+ label         precision  recall  f1-score  support 
+ NEGATIVE      0.68       0.43    0.53      232     
+ NEUTRAL       0.44       0.65    0.53      441     
+ POSITIVE      0.79       0.69    0.74      947     
+ accuracy      -          -       0.64      1620    
+ macro-avg     0.64       0.59    0.60      1620    
+ weighted-avg  0.68       0.64    0.65      1620    
+```
diff --git a/docs/_posts/Mary-Sci/2023-11-11-finner_aspect_based_sentiment_md_en.md b/docs/_posts/Mary-Sci/2023-11-11-finner_aspect_based_sentiment_md_en.md
@@ -0,0 +1,136 @@
+---
+layout: model
+title: Financial NER on Aspect-Based Sentiment Analysis
+author: John Snow Labs
+name: finner_aspect_based_sentiment_md
+date: 2023-11-11
+tags: [ner, licensed, finance, en]
+task: Named Entity Recognition
+language: en
+edition: Finance NLP 1.0.0
+spark_version: 3.0
+supported: true
+annotator: FinanceNerModel
+article_header:
+type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This NER model identifies entities that can be associated with a financial sentiment. The model is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.
+
+## Predicted Entities
+
+`ASSET`, `CASHFLOW`, `EXPENSE`, `FREE_CASH_FLOW`, `GAINS`, `KPI`, `LIABILITY`, `LOSSES`, `PROFIT`, `REVENUE`
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_md_en_1.0.0_3.0_1699704469251.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_md_en_1.0.0_3.0_1699704469251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = nlp.DocumentAssembler()\
+    .setInputCol("text")\
+    .setOutputCol("document")
+
+# Sentence Detector annotator, processes various sentences per line
+sentenceDetector = nlp.SentenceDetector()\
+    .setInputCols(["document"])\
+    .setOutputCol("sentence")
+
+# Tokenizer splits words in a relevant format for NLP
+tokenizer = nlp.Tokenizer()\
+    .setInputCols(["sentence"])\
+    .setOutputCol("token")
+
+bert_embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")\
+    .setInputCols("sentence", "token")\
+    .setOutputCol("embeddings")\
+    .setMaxSentenceLength(512)
+
+
+ner_model = finance.NerModel().pretrained("finner_aspect_based_sentiment_md", "en", "finance/models")\
+    .setInputCols(["sentence", "token", "embeddings"])\
+    .setOutputCol("ner")
+
+ner_converter = nlp.NerConverter()\
+    .setInputCols(["sentence","token","ner"])\
+    .setOutputCol("ner_chunk")
+
+nlpPipeline = nlp.Pipeline(stages=[
+        documentAssembler,
+        sentenceDetector,
+        tokenizer,
+        bert_embeddings,
+        ner_model,
+        ner_converter])
+
+empty_data = spark.createDataFrame([[""]]).toDF("text")
+model = nlpPipeline.fit(empty_data)
+
+text = ["""Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."""]
+result = model.transform(spark.createDataFrame([text]).toDF("text"))
+
+from pyspark.sql import functions as F
+
+result.select(F.explode(F.arrays_zip(result.ner_chunk.result, result.ner_chunk.begin, result.ner_chunk.end, result.ner_chunk.metadata)).alias("cols")) \
+               .select(F.expr("cols['0']").alias("chunk"),
+                       F.expr("cols['1']").alias("begin"),
+                       F.expr("cols['2']").alias("end"),
+                       F.expr("cols['3']['entity']").alias("ner_label")
+                       ).show(100, truncate=False)
+```
+
+</div>
+
+## Results
+
+```bash
++--------+-----+---+---------+
+|chunk   |begin|end|ner_label|
++--------+-----+---+---------+
+|Equity  |1    |6  |LIABILITY|
+|earnings|12   |19 |PROFIT   |
++--------+-----+---+---------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|finner_aspect_based_sentiment_md|
+|Compatibility:|Finance NLP 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[sentence, token, embeddings]|
+|Output Labels:|[ner]|
+|Language:|en|
+|Size:|16.5 MB|
+
+## Benchmarking
+
+```bash
+ label           precision  recall  f1-score  support 
+ ASSET           0.50       0.72    0.59      53      
+ CASHFLOW        0.78       0.60    0.68      30      
+ EXPENSE         0.71       0.68    0.70      151     
+ FREE_CASH_FLOW  1.00       1.00    1.00      19      
+ GAINS           0.80       0.78    0.79      55      
+ KPI             0.72       0.58    0.64      106     
+ LIABILITY       0.65       0.51    0.57      39      
+ LOSSES          0.77       0.59    0.67      29      
+ PROFIT          0.77       0.74    0.75      101     
+ REVENUE         0.74       0.78    0.76      231     
+ micro-avg       0.72       0.71    0.71      814     
+ macro-avg       0.74       0.70    0.71      814     
+ weighted-avg    0.73       0.71    0.71      814  
+```
diff --git a/docs/_posts/dcecchini/2023-10-06-finembedding_e5_base_en.md b/docs/_posts/dcecchini/2023-10-06-finembedding_e5_base_en.md
@@ -87,4 +87,5 @@ result. Select("E5.result").show()
 
 ## References
 
+
 In-house curated financial datasets.
diff --git a/docs/_posts/dcecchini/2023-11-09-finembedding_e5_large_en.md b/docs/_posts/dcecchini/2023-11-09-finembedding_e5_large_en.md
@@ -0,0 +1,90 @@
+---
+layout: model
+title: Finance E5 Embedding Large
+author: John Snow Labs
+name: finembedding_e5_large
+date: 2023-11-09
+tags: [finance, en, licensed, e5, sentence_embedding, onnx]
+task: Embeddings
+language: en
+edition: Finance NLP 1.0.0
+spark_version: 3.0
+supported: true
+engine: onnx
+annotator: E5Embeddings
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This model is a financial version of the E5 large model fine-tuned on in-house curated financial datasets. Reference: Wang, Liang, et al. “Text embeddings by weakly-supervised contrastive pre-training.” arXiv preprint arXiv:2212.03533 (2022).
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finembedding_e5_large_en_1.0.0_3.0_1699530885080.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finembedding_e5_large_en_1.0.0_3.0_1699530885080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+document_assembler = (
+    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
+)
+
+E5_embedding = (
+    nlp.E5Embeddings.pretrained(
+        "finembedding_e5_large", "en", "finance/models"
+    )
+    .setInputCols(["document"])
+    .setOutputCol("E5")
+)
+pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])
+
+data = spark.createDataFrame(
+    [["What is the best way to invest in the stock market?"]]
+).toDF("text")
+
+result = pipeline.fit(data).transform(data)
+result. Select("E5.result").show()
+```
+
+</div>
+
+## Results
+
+```bash
++----------------------------------------------------------------------------------------------------+
+|                                                                                          embeddings|
++----------------------------------------------------------------------------------------------------+
+|[0.8358813, -1.30341, -0.576791, 0.25893408, 0.26888973, 0.028243342, 0.47971666, 0.47653574, 0.4...|
++----------------------------------------------------------------------------------------------------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|finembedding_e5_large|
+|Compatibility:|Finance NLP 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[document]|
+|Output Labels:|[E5]|
+|Language:|en|
+|Size:|1.2 GB|
+
+## References
+
+In-house annotated financial datasets.

Original file line number	Diff line number	Diff line change
`@@ -87,4 +87,5 @@ result. Select("E5.result").show()`
`87`	`87`
`88`	`88`	`## References`
`89`	`89`
	`90`	`+`
`90`	`91`	`In-house curated financial datasets.`