Skip to content

Commit 11a114e

Browse files
dcecchinijsl-modelsgadde5300Meryem1425SKocer
authored
Models hub finance (#687)
* Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 <[email protected]> * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer <[email protected]> * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer <[email protected]> * 2023-10-06-finembedding_e5_base_en (#685) * Add model 2023-10-06-finembedding_e5_base_en * Add model 2023-10-06-finner_absa_sm_en * Add model 2023-10-06-finassertion_absa_sm_en --------- Co-authored-by: dcecchini <[email protected]> --------- Co-authored-by: jsl-models <[email protected]> Co-authored-by: gadde5300 <[email protected]> Co-authored-by: Meryem1425 <[email protected]> Co-authored-by: SKocer <[email protected]> Co-authored-by: Merve Ertas Uslu <[email protected]> Co-authored-by: gokhanturer <[email protected]>
1 parent e4d51ef commit 11a114e

File tree

3 files changed

+391
-0
lines changed

3 files changed

+391
-0
lines changed
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
layout: model
3+
title: Financial Assertion of Sentiment (sm, Small)
4+
author: John Snow Labs
5+
name: finassertion_absa_sm
6+
date: 2023-10-06
7+
tags: [finance, assertion, en, sentiment_analysis, licensed]
8+
task: Assertion Status
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: AssertionDLModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This assertion model classifies financial entities into a sentiment. It is designed to be used together with the associated NER model.
22+
23+
## Predicted Entities
24+
25+
`POSITIVE`, `NEGATIVE`, `NEUTRAL`
26+
27+
{:.btn-box}
28+
<button class="button button-orange" disabled>Live Demo</button>
29+
<button class="button button-orange" disabled>Open in Colab</button>
30+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finassertion_absa_sm_en_1.0.0_3.0_1696606845902.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
31+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finassertion_absa_sm_en_1.0.0_3.0_1696606845902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
documentAssembler = (
41+
nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
42+
)
43+
44+
# Sentence Detector annotator, processes various sentences per line
45+
sentenceDetector = (
46+
nlp.SentenceDetector()
47+
.setInputCols(["document"])
48+
.setOutputCol("sentence")
49+
)
50+
51+
# Tokenizer splits words in a relevant format for NLP
52+
tokenizer = (
53+
nlp.Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
54+
)
55+
56+
bert_embeddings = (
57+
nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")
58+
.setInputCols("document", "token")
59+
.setOutputCol("embeddings")
60+
.setMaxSentenceLength(512)
61+
)
62+
63+
clinical_ner = (
64+
finance.NerModel.pretrained("finner_absa_sm", "en", "finance/models")
65+
.setInputCols(["sentence", "token", "embeddings"])
66+
.setOutputCol("ner")
67+
)
68+
69+
ner_converter = (
70+
finance.NerConverterInternal()
71+
.setInputCols(["sentence", "token", "ner"])
72+
.setOutputCol("ner_chunk")
73+
)
74+
75+
assertion_model = (
76+
finance.AssertionDLModel.pretrained("finassertion_absa_sm", "en", "finance/models")
77+
.setInputCols(["sentence", "ner_chunk", "embeddings"])
78+
.setOutputCol("assertion")
79+
)
80+
81+
nlpPipeline = nlp.Pipeline(
82+
stages=[
83+
documentAssembler,
84+
sentenceDetector,
85+
tokenizer,
86+
bert_embeddings,
87+
clinical_ner,
88+
ner_converter,
89+
assertion_model,
90+
]
91+
)
92+
93+
94+
text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."
95+
96+
spark_df = spark.createDataFrame([[text]]).toDF("text")
97+
98+
result = model.fit(spark_df ).transform(spark_df)
99+
100+
result.select(
101+
F.explode(
102+
F.arrays_zip("ner_chunk.result", "ner_chunk.metadata")
103+
).alias("cols")
104+
).select(
105+
F.expr("cols['0']").alias("entity"),
106+
F.expr("cols['1']['entity']").alias("label"),
107+
).show(
108+
50, truncate=False
109+
)
110+
```
111+
112+
</div>
113+
114+
## Results
115+
116+
```bash
117+
+--------+---------+
118+
|entity |label |
119+
+--------+---------+
120+
|Equity |LIABILITY|
121+
|earnings|PROFIT |
122+
+--------+---------+
123+
```
124+
125+
{:.model-param}
126+
## Model Information
127+
128+
{:.table-model}
129+
|---|---|
130+
|Model Name:|finassertion_absa_sm|
131+
|Compatibility:|Finance NLP 1.0.0+|
132+
|License:|Licensed|
133+
|Edition:|Official|
134+
|Input Labels:|[document, chunk, embeddings]|
135+
|Output Labels:|[assertion]|
136+
|Language:|en|
137+
|Size:|2.7 MB|
138+
139+
## References
140+
141+
In-house annotations of earning call transcripts.
142+
143+
## Benchmarking
144+
145+
```bash
146+
label precision recall f1-score support
147+
148+
NEGATIVE 0.57 0.42 0.48 74
149+
NEUTRAL 0.51 0.70 0.59 184
150+
POSITIVE 0.75 0.64 0.69 324
151+
```
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
layout: model
3+
title: Finance E5 Embedding Base
4+
author: John Snow Labs
5+
name: finembedding_e5_base
6+
date: 2023-10-06
7+
tags: [finance, en, licensed, e5, sentence_embedding, onnx]
8+
task: Embeddings
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: onnx
14+
annotator: E5Embeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This model is a financial version of the E5 base model fine-tuned on earning call transcripts and finance question-answering datasets. Reference: Wang, Liang, et al. "Text embeddings by weakly-supervised contrastive pre-training." arXiv preprint arXiv:2212.03533 (2022).
23+
24+
## Predicted Entities
25+
26+
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finembedding_e5_base_en_1.0.0_3.0_1696603847700.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finembedding_e5_base_en_1.0.0_3.0_1696603847700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
37+
38+
<div class="tabs-box" markdown="1">
39+
{% include programmingLanguageSelectScalaPythonNLU.html %}
40+
```python
41+
document_assembler = (
42+
nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
43+
)
44+
45+
E5_embedding = (
46+
nlp.E5Embeddings.pretrained(
47+
"finembedding_e5_base", "en", "finance/models"
48+
)
49+
.setInputCols(["document"])
50+
.setOutputCol("E5")
51+
)
52+
pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])
53+
54+
data = spark.createDataFrame(
55+
[["What is the best way to invest in the stock market?"]]
56+
).toDF("text")
57+
58+
result = pipeline.fit(data).transform(data)
59+
result. Select("E5.result").show()
60+
```
61+
62+
</div>
63+
64+
## Results
65+
66+
```bash
67+
+----------------------------------------------------------------------------------------------------+
68+
| embeddings|
69+
+----------------------------------------------------------------------------------------------------+
70+
|[0.45521045, -0.16874692, -0.06179046, -0.37956607, 1.152633, 0.6849592, -0.9676384, 0.4624033, ...|
71+
+----------------------------------------------------------------------------------------------------+
72+
```
73+
74+
{:.model-param}
75+
## Model Information
76+
77+
{:.table-model}
78+
|---|---|
79+
|Model Name:|finembedding_e5_base|
80+
|Compatibility:|Finance NLP 1.0.0+|
81+
|License:|Licensed|
82+
|Edition:|Official|
83+
|Input Labels:|[document]|
84+
|Output Labels:|[E5]|
85+
|Language:|en|
86+
|Size:|398.5 MB|
87+
88+
## References
89+
90+
For our Finance models, we will use publicly available datasets to fine-tune the model:
91+
92+
- [FiQA](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/)
93+
- In-house annotated Earning Calls Transcripts
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
layout: model
3+
title: Financial NER for Aspect-based Sentiment Analysis (sm, Small)
4+
author: John Snow Labs
5+
name: finner_absa_sm
6+
date: 2023-10-06
7+
tags: [finance, en, ner, licensed]
8+
task: Named Entity Recognition
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: FinanceNerModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This NER model identifies entities that can be associated with a financial sentiment. The model is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.
22+
23+
## Predicted Entities
24+
25+
`REVENUE`, `EXPENSE`, `PROFIT`, `KPI`, `GAINS`, `ASSET`, `LIABILITY`, `CASHFLOW`, `LOSSES`, `FREE_CASH_FLOW`
26+
27+
{:.btn-box}
28+
<button class="button button-orange" disabled>Live Demo</button>
29+
<button class="button button-orange" disabled>Open in Colab</button>
30+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_absa_sm_en_1.0.0_3.0_1696605316183.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
31+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_absa_sm_en_1.0.0_3.0_1696605316183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
document_assembler = nlp.DocumentAssembler()\
41+
.setInputCol("text")\
42+
.setOutputCol("document")
43+
44+
sentence_detector = nlp.SentenceDetector() \
45+
.setInputCols(["document"]) \
46+
.setOutputCol("sentence") \
47+
.setCustomBounds(["\n\n"])
48+
49+
tokenizer = nlp.Tokenizer()\
50+
.setInputCols(["sentence"])\
51+
.setOutputCol("token")
52+
53+
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en")\
54+
.setInputCols(["sentence", "token"])\
55+
.setOutputCol("embeddings")\
56+
.setCaseSensitive(True)\
57+
.setMaxSentenceLength(512)
58+
59+
ner_model = finance.NerModel.pretrained("finner_absa_sm", "en", "finance/models")\
60+
.setInputCols(["sentence", "token", "embeddings"])\
61+
.setOutputCol("ner")\
62+
63+
ner_converter = finance.NerConverterInternal()\
64+
.setInputCols(["sentence", "token", "ner"])\
65+
.setOutputCol("ner_chunk")
66+
67+
pipeline = nlp.Pipeline(stages=[
68+
document_assembler,
69+
sentence_detector,
70+
tokenizer,
71+
embeddings,
72+
ner_model,
73+
ner_converter
74+
])
75+
76+
model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
77+
78+
79+
text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."
80+
81+
spark_df = spark.createDataFrame([[text]]).toDF("text")
82+
83+
result = model. Transform(spark_df)
84+
result. Select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
85+
.select(F.expr("cols['0']").alias("entity"),
86+
F.expr("cols['1']['entity']").alias("label")).show(50, truncate = False)
87+
88+
```
89+
90+
</div>
91+
92+
## Results
93+
94+
```bash
95+
+--------+---------+
96+
|entity |label |
97+
+--------+---------+
98+
|Equity |LIABILITY|
99+
|earnings|PROFIT |
100+
+--------+---------+
101+
```
102+
103+
{:.model-param}
104+
## Model Information
105+
106+
{:.table-model}
107+
|---|---|
108+
|Model Name:|finner_absa_sm|
109+
|Compatibility:|Finance NLP 1.0.0+|
110+
|License:|Licensed|
111+
|Edition:|Official|
112+
|Input Labels:|[sentence, token, embeddings]|
113+
|Output Labels:|[ner]|
114+
|Language:|en|
115+
|Size:|16.3 MB|
116+
117+
## References
118+
119+
In-house annotations of earning call transcripts.
120+
121+
## Benchmarking
122+
123+
```bash
124+
label precision recall f1-score support
125+
126+
B-ASSET 0.6000 0.2400 0.3429 25
127+
B-CASHFLOW 0.7000 0.5833 0.6364 12
128+
B-EXPENSE 0.7222 0.6500 0.6842 60
129+
B-FREE_CASH_FLOW 1.0000 1.0000 1.0000 8
130+
B-GAINS 0.7333 0.5946 0.6567 37
131+
B-KPI 0.7143 0.5556 0.6250 36
132+
B-LIABILITY 0.5000 0.2778 0.3571 18
133+
B-LOSSES 0.7143 0.7143 0.7143 7
134+
B-PROFIT 0.8462 0.8919 0.8684 37
135+
B-REVENUE 0.7385 0.8000 0.7680 60
136+
I-ASSET 0.8000 0.3636 0.5000 11
137+
I-CASHFLOW 0.9091 0.9091 0.9091 11
138+
I-EXPENSE 0.7451 0.6230 0.6786 61
139+
I-FREE_CASH_FLOW 1.0000 1.0000 1.0000 17
140+
I-GAINS 0.8333 0.6667 0.7407 30
141+
I-KPI 0.8500 0.5000 0.6296 34
142+
I-LIABILITY 0.5000 0.5000 0.5000 6
143+
I-LOSSES 0.7143 0.6250 0.6667 8
144+
I-PROFIT 0.8621 0.9615 0.9091 26
145+
I-REVENUE 0.7600 0.7308 0.7451 26
146+
O 0.9839 0.9923 0.9880 8660
147+
```

0 commit comments

Comments
 (0)