Skip to content

Commit 06f400d

Browse files
dcecchinijsl-modelsgadde5300Meryem1425SKocer
authored
Finance 1.20.0 (#755)
* Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 <[email protected]> * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer <[email protected]> * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer <[email protected]> * 2023-10-06-finembedding_e5_base_en (#685) * Add model 2023-10-06-finembedding_e5_base_en * Add model 2023-10-06-finner_absa_sm_en * Add model 2023-10-06-finassertion_absa_sm_en --------- Co-authored-by: dcecchini <[email protected]> * Add model 2023-11-09-finembedding_e5_large_en (#745) Co-authored-by: dcecchini <[email protected]> * 2023-11-11-finner_aspect_based_sentiment_md_en (#754) * Add model 2023-11-11-finner_aspect_based_sentiment_md_en * Add model 2023-11-11-finassertion_aspect_based_sentiment_md_en * Update 2023-11-11-finner_aspect_based_sentiment_md_en.md * Update 2023-11-11-finassertion_aspect_based_sentiment_md_en.md --------- Co-authored-by: Mary-Sci <[email protected]> Co-authored-by: Merve Ertas Uslu <[email protected]> --------- Co-authored-by: jsl-models <[email protected]> Co-authored-by: gadde5300 <[email protected]> Co-authored-by: Meryem1425 <[email protected]> Co-authored-by: SKocer <[email protected]> Co-authored-by: Merve Ertas Uslu <[email protected]> Co-authored-by: gokhanturer <[email protected]> Co-authored-by: Mary-Sci <[email protected]>
1 parent bd84be5 commit 06f400d

4 files changed

+358
-0
lines changed
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
layout: model
3+
title: Financial Assertion of Aspect-Based Sentiment (md, Medium)
4+
author: John Snow Labs
5+
name: finassertion_aspect_based_sentiment_md
6+
date: 2023-11-11
7+
tags: [assertion, licensed, en, finance]
8+
task: Assertion Status
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: AssertionDLModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This assertion model classifies financial entities into an aspect-based sentiment. It is designed to be used together with the associated NER model.
22+
23+
## Predicted Entities
24+
25+
`POSITIVE`, `NEGATIVE`, `NEUTRAL`
26+
27+
{:.btn-box}
28+
<button class="button button-orange" disabled>Live Demo</button>
29+
<button class="button button-orange" disabled>Open in Colab</button>
30+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_en_1.0.0_3.0_1699705705778.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
31+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_en_1.0.0_3.0_1699705705778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
documentAssembler = nlp.DocumentAssembler()\
41+
.setInputCol("text")\
42+
.setOutputCol("document")
43+
44+
# Sentence Detector annotator, processes various sentences per line
45+
sentenceDetector = nlp.SentenceDetector()\
46+
.setInputCols(["document"])\
47+
.setOutputCol("sentence")
48+
49+
# Tokenizer splits words in a relevant format for NLP
50+
tokenizer = nlp.Tokenizer()\
51+
.setInputCols(["sentence"])\
52+
.setOutputCol("token")
53+
54+
bert_embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")\
55+
.setInputCols("sentence", "token")\
56+
.setOutputCol("embeddings")\
57+
.setMaxSentenceLength(512)
58+
59+
finance_ner = finance.NerModel.pretrained("finner_aspect_based_sentiment_md", "en", "finance/models")\
60+
.setInputCols(["sentence", "token", "embeddings"])\
61+
.setOutputCol("ner")
62+
63+
ner_converter = finance.NerConverterInternal()\
64+
.setInputCols(["sentence", "token", "ner"])\
65+
.setOutputCol("ner_chunk")
66+
67+
assertion_model = finance.AssertionDLModel.pretrained("finassertion_aspect_based_sentiment_md", "en", "finance/models")\
68+
.setInputCols(["sentence", "ner_chunk", "embeddings"])\
69+
.setOutputCol("assertion")
70+
71+
72+
nlpPipeline = nlp.Pipeline(
73+
stages=[documentAssembler,
74+
sentenceDetector,
75+
tokenizer,
76+
bert_embeddings,
77+
finance_ner,
78+
ner_converter,
79+
assertion_model])
80+
81+
text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."
82+
83+
spark_df = spark.createDataFrame([[text]]).toDF("text")
84+
85+
result = nlpPipeline.fit(spark_df ).transform(spark_df)
86+
87+
result.select(F.explode(F.arrays_zip("ner_chunk.result", "ner_chunk.metadata", "assertion.result", "assertion.metadata")).alias("cols"))\
88+
.select(F.expr("cols['0']").alias("entity"),
89+
F.expr("cols['1']['entity']").alias("label"),
90+
F.expr("cols['2']").alias("assertion"),
91+
F.expr("cols['3']['confidence']").alias("confidence")).show(50, truncate=False)
92+
```
93+
94+
</div>
95+
96+
## Results
97+
98+
```bash
99+
+--------+---------+---------+----------+
100+
|entity |label |assertion|confidence|
101+
+--------+---------+---------+----------+
102+
|Equity |LIABILITY|POSITIVE |0.9895 |
103+
|earnings|PROFIT |POSITIVE |0.995 |
104+
+--------+---------+---------+----------+
105+
```
106+
107+
{:.model-param}
108+
## Model Information
109+
110+
{:.table-model}
111+
|---|---|
112+
|Model Name:|finassertion_aspect_based_sentiment_md|
113+
|Compatibility:|Finance NLP 1.0.0+|
114+
|License:|Licensed|
115+
|Edition:|Official|
116+
|Input Labels:|[document, chunk, embeddings]|
117+
|Output Labels:|[assertion]|
118+
|Language:|en|
119+
|Size:|2.7 MB|
120+
121+
## Benchmarking
122+
123+
```bash
124+
label precision recall f1-score support
125+
NEGATIVE 0.68 0.43 0.53 232
126+
NEUTRAL 0.44 0.65 0.53 441
127+
POSITIVE 0.79 0.69 0.74 947
128+
accuracy - - 0.64 1620
129+
macro-avg 0.64 0.59 0.60 1620
130+
weighted-avg 0.68 0.64 0.65 1620
131+
```
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
layout: model
3+
title: Financial NER on Aspect-Based Sentiment Analysis
4+
author: John Snow Labs
5+
name: finner_aspect_based_sentiment_md
6+
date: 2023-11-11
7+
tags: [ner, licensed, finance, en]
8+
task: Named Entity Recognition
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: FinanceNerModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This NER model identifies entities that can be associated with a financial sentiment. The model is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.
22+
23+
## Predicted Entities
24+
25+
`ASSET`, `CASHFLOW`, `EXPENSE`, `FREE_CASH_FLOW`, `GAINS`, `KPI`, `LIABILITY`, `LOSSES`, `PROFIT`, `REVENUE`
26+
27+
{:.btn-box}
28+
<button class="button button-orange" disabled>Live Demo</button>
29+
<button class="button button-orange" disabled>Open in Colab</button>
30+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_md_en_1.0.0_3.0_1699704469251.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
31+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_md_en_1.0.0_3.0_1699704469251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
documentAssembler = nlp.DocumentAssembler()\
41+
.setInputCol("text")\
42+
.setOutputCol("document")
43+
44+
# Sentence Detector annotator, processes various sentences per line
45+
sentenceDetector = nlp.SentenceDetector()\
46+
.setInputCols(["document"])\
47+
.setOutputCol("sentence")
48+
49+
# Tokenizer splits words in a relevant format for NLP
50+
tokenizer = nlp.Tokenizer()\
51+
.setInputCols(["sentence"])\
52+
.setOutputCol("token")
53+
54+
bert_embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")\
55+
.setInputCols("sentence", "token")\
56+
.setOutputCol("embeddings")\
57+
.setMaxSentenceLength(512)
58+
59+
60+
ner_model = finance.NerModel().pretrained("finner_aspect_based_sentiment_md", "en", "finance/models")\
61+
.setInputCols(["sentence", "token", "embeddings"])\
62+
.setOutputCol("ner")
63+
64+
ner_converter = nlp.NerConverter()\
65+
.setInputCols(["sentence","token","ner"])\
66+
.setOutputCol("ner_chunk")
67+
68+
nlpPipeline = nlp.Pipeline(stages=[
69+
documentAssembler,
70+
sentenceDetector,
71+
tokenizer,
72+
bert_embeddings,
73+
ner_model,
74+
ner_converter])
75+
76+
empty_data = spark.createDataFrame([[""]]).toDF("text")
77+
model = nlpPipeline.fit(empty_data)
78+
79+
text = ["""Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."""]
80+
result = model.transform(spark.createDataFrame([text]).toDF("text"))
81+
82+
from pyspark.sql import functions as F
83+
84+
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, result.ner_chunk.begin, result.ner_chunk.end, result.ner_chunk.metadata)).alias("cols")) \
85+
.select(F.expr("cols['0']").alias("chunk"),
86+
F.expr("cols['1']").alias("begin"),
87+
F.expr("cols['2']").alias("end"),
88+
F.expr("cols['3']['entity']").alias("ner_label")
89+
).show(100, truncate=False)
90+
```
91+
92+
</div>
93+
94+
## Results
95+
96+
```bash
97+
+--------+-----+---+---------+
98+
|chunk |begin|end|ner_label|
99+
+--------+-----+---+---------+
100+
|Equity |1 |6 |LIABILITY|
101+
|earnings|12 |19 |PROFIT |
102+
+--------+-----+---+---------+
103+
```
104+
105+
{:.model-param}
106+
## Model Information
107+
108+
{:.table-model}
109+
|---|---|
110+
|Model Name:|finner_aspect_based_sentiment_md|
111+
|Compatibility:|Finance NLP 1.0.0+|
112+
|License:|Licensed|
113+
|Edition:|Official|
114+
|Input Labels:|[sentence, token, embeddings]|
115+
|Output Labels:|[ner]|
116+
|Language:|en|
117+
|Size:|16.5 MB|
118+
119+
## Benchmarking
120+
121+
```bash
122+
label precision recall f1-score support
123+
ASSET 0.50 0.72 0.59 53
124+
CASHFLOW 0.78 0.60 0.68 30
125+
EXPENSE 0.71 0.68 0.70 151
126+
FREE_CASH_FLOW 1.00 1.00 1.00 19
127+
GAINS 0.80 0.78 0.79 55
128+
KPI 0.72 0.58 0.64 106
129+
LIABILITY 0.65 0.51 0.57 39
130+
LOSSES 0.77 0.59 0.67 29
131+
PROFIT 0.77 0.74 0.75 101
132+
REVENUE 0.74 0.78 0.76 231
133+
micro-avg 0.72 0.71 0.71 814
134+
macro-avg 0.74 0.70 0.71 814
135+
weighted-avg 0.73 0.71 0.71 814
136+
```

docs/_posts/dcecchini/2023-10-06-finembedding_e5_base_en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,4 +87,5 @@ result. Select("E5.result").show()
8787
8888
## References
8989
90+
9091
In-house curated financial datasets.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
layout: model
3+
title: Finance E5 Embedding Large
4+
author: John Snow Labs
5+
name: finembedding_e5_large
6+
date: 2023-11-09
7+
tags: [finance, en, licensed, e5, sentence_embedding, onnx]
8+
task: Embeddings
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: onnx
14+
annotator: E5Embeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This model is a financial version of the E5 large model fine-tuned on in-house curated financial datasets. Reference: Wang, Liang, et al. “Text embeddings by weakly-supervised contrastive pre-training.” arXiv preprint arXiv:2212.03533 (2022).
23+
24+
## Predicted Entities
25+
26+
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finembedding_e5_large_en_1.0.0_3.0_1699530885080.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finembedding_e5_large_en_1.0.0_3.0_1699530885080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
37+
38+
<div class="tabs-box" markdown="1">
39+
{% include programmingLanguageSelectScalaPythonNLU.html %}
40+
```python
41+
document_assembler = (
42+
nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
43+
)
44+
45+
E5_embedding = (
46+
nlp.E5Embeddings.pretrained(
47+
"finembedding_e5_large", "en", "finance/models"
48+
)
49+
.setInputCols(["document"])
50+
.setOutputCol("E5")
51+
)
52+
pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])
53+
54+
data = spark.createDataFrame(
55+
[["What is the best way to invest in the stock market?"]]
56+
).toDF("text")
57+
58+
result = pipeline.fit(data).transform(data)
59+
result. Select("E5.result").show()
60+
```
61+
62+
</div>
63+
64+
## Results
65+
66+
```bash
67+
+----------------------------------------------------------------------------------------------------+
68+
| embeddings|
69+
+----------------------------------------------------------------------------------------------------+
70+
|[0.8358813, -1.30341, -0.576791, 0.25893408, 0.26888973, 0.028243342, 0.47971666, 0.47653574, 0.4...|
71+
+----------------------------------------------------------------------------------------------------+
72+
```
73+
74+
{:.model-param}
75+
## Model Information
76+
77+
{:.table-model}
78+
|---|---|
79+
|Model Name:|finembedding_e5_large|
80+
|Compatibility:|Finance NLP 1.0.0+|
81+
|License:|Licensed|
82+
|Edition:|Official|
83+
|Input Labels:|[document]|
84+
|Output Labels:|[E5]|
85+
|Language:|en|
86+
|Size:|1.2 GB|
87+
88+
## References
89+
90+
In-house annotated financial datasets.

0 commit comments

Comments
 (0)