Skip to content

Commit 2dc1dcc

Browse files
dcecchinijsl-modelsgadde5300Meryem1425SKocer
authored
Models hub finance (#594)
* Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 <[email protected]> * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer <[email protected]> * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer <[email protected]> * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer <[email protected]> * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer <[email protected]> --------- Co-authored-by: jsl-models <[email protected]> Co-authored-by: gadde5300 <[email protected]> Co-authored-by: Meryem1425 <[email protected]> Co-authored-by: SKocer <[email protected]> Co-authored-by: Merve Ertas Uslu <[email protected]> Co-authored-by: gokhanturer <[email protected]>
1 parent 03df349 commit 2dc1dcc

File tree

1 file changed

+156
-0
lines changed

1 file changed

+156
-0
lines changed
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
layout: model
3+
title: Financial Deidentification Pipeline
4+
author: John Snow Labs
5+
name: finpipe_deid
6+
date: 2023-09-01
7+
tags: [licensed, en, finance, deid, deidentification, anonymization]
8+
task: Pipeline Finance
9+
language: en
10+
edition: Finance NLP 1.0.0
11+
spark_version: 3.4
12+
supported: true
13+
annotator: PipelineModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This is a Pretrained Pipeline aimed to deidentify legal and financial documents to be compliant with data privacy regulations as GDPR and CCPA. Since the models used in this pipeline are statistical, make sure you use this model in a human-in-the-loop process to guarantee a 100% accuracy.
22+
23+
You can carry out both masking and obfuscation with this pipeline, on the following entities:
24+
`ALIAS`, `EMAIL`, `PHONE`, `PROFESSION`, `ORG`, `DATE`, `PERSON`, `ADDRESS`, `STREET`, `CITY`, `STATE`, `ZIP`, `COUNTRY`, `TITLE_CLASS`, `TICKER`, `STOCK_EXCHANGE`, `CFN`, `IRS`
25+
26+
{:.btn-box}
27+
<button class="button button-orange" disabled>Live Demo</button>
28+
<button class="button button-orange" disabled>Open in Colab</button>
29+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.4_1693602582270.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
30+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.4_1693602582270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
31+
32+
## How to use
33+
34+
35+
36+
<div class="tabs-box" markdown="1">
37+
{% include programmingLanguageSelectScalaPythonNLU.html %}
38+
```python
39+
40+
from sparknlp.pretrained import PretrainedPipeline
41+
42+
deid_pipeline = PretrainedPipeline("finpipe_deid", "en", "finance/models")
43+
44+
result = deid_pipeline.annotate("""CARGILL, INCORPORATED
45+
46+
By: Pirkko Suominen
47+
48+
49+
50+
Name: Pirkko Suominen Title: Director, Bio Technology Development Center, Date: 10/19/2011
51+
52+
BIOAMBER, SAS
53+
54+
By: Jean-François Huc
55+
56+
57+
58+
Name: Jean-François Huc Title: President Date: October 15, 2011
59+
60+
61+
phone : 18087339090 """)
62+
63+
```
64+
65+
</div>
66+
67+
## Results
68+
69+
```bash
70+
Masked with entity labels
71+
------------------------------
72+
<PARTY>, <PARTY>
73+
By: <SIGNING_PERSON>
74+
Name: <PARTY>: <SIGNING_TITLE>, Date: <EFFDATE>
75+
<PARTY>, <PARTY>
76+
By: <SIGNING_PERSON>
77+
Name: <PARTY>: <SIGNING_TITLE>Date: <EFFDATE>
78+
79+
email : <EMAIL>
80+
phone : <PHONE>
81+
82+
Masked with chars
83+
------------------------------
84+
[*****], [**********]
85+
By: [*************]
86+
Name: [*******************]: [**********************************] Center, Date: [********]
87+
[******], [*]
88+
By: [***************]
89+
Name: [**********************]: [*******]Date: [**************]
90+
91+
email : [****************]
92+
phone : [********]
93+
94+
Masked with fixed length chars
95+
------------------------------
96+
****, ****
97+
By: ****
98+
Name: ****: ****, Date: ****
99+
****, ****
100+
By: ****
101+
Name: ****: ****Date: ****
102+
103+
email : ****
104+
phone : ****
105+
106+
Obfuscated
107+
------------------------------
108+
MGT Trust Company, LLC., Clarus llc.
109+
By: Benjamin Dean
110+
Name: John Snow Labs Inc: Sales Manager, Date: 03/08/2025
111+
Clarus llc., SESA CO.
112+
By: JAMES TURNER
113+
Name: MGT Trust Company, LLC.: Business ManagerDate: 11/7/2016
114+
115+
116+
phone : 78 834 854
117+
118+
```
119+
120+
{:.model-param}
121+
## Model Information
122+
123+
{:.table-model}
124+
|---|---|
125+
|Model Name:|finpipe_deid|
126+
|Type:|pipeline|
127+
|Compatibility:|Finance NLP 1.0.0+|
128+
|License:|Licensed|
129+
|Edition:|Official|
130+
|Language:|en|
131+
|Size:|475.2 MB|
132+
133+
## Included Models
134+
135+
- DocumentAssembler
136+
- SentenceDetector
137+
- TokenizerModel
138+
- BertEmbeddings
139+
- FinanceNerModel
140+
- NerConverterInternalModel
141+
- FinanceNerModel
142+
- NerConverterInternalModel
143+
- FinanceNerModel
144+
- NerConverterInternalModel
145+
- FinanceNerModel
146+
- NerConverterInternalModel
147+
- ContextualParserModel
148+
- ContextualParserModel
149+
- ContextualParserModel
150+
- ContextualParserModel
151+
- ContextualParserModel
152+
- ChunkMergeModel
153+
- DeIdentificationModel
154+
- DeIdentificationModel
155+
- DeIdentificationModel
156+
- DeIdentificationModel

0 commit comments

Comments
 (0)