Skip to content

Commit 1d1c757

Browse files
authored
support custom pipelines and deprecate nlu reference for docker and snowflake utils (#1869)
1 parent 225cf19 commit 1d1c757

File tree

9 files changed

+741
-140
lines changed

9 files changed

+741
-140
lines changed

docs/en/jsl/docker_utils.md

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,47 +20,81 @@ This enables you to package any johnsnowlabs model into a docker image and serve
2020

2121
## Image creation
2222
With `nlp.build_image` you can create a docker image with any johnsnowlabs model pre-packed and ready to serve.
23-
You just need to specify the [nlu reference to the model](https://nlp.johnsnowlabs.com/docs/en/jsl/namespace) and the **name of the output image**
24-
Additionally, you can set the `hardware_target` to `cpu`, `gpu`, `apple_silicon` or `aarch` to package with Jar's optimized for specific hardware.
23+
You can set the `hardware_platform` to `cpu`, `gpu`, `apple_silicon` or `aarch` to package with Jar's optimized for specific hardware.
24+
You can either specify `pipeline_name`, `pipeline_language` and `pipeline_bucket` to create an image with a specific pipeline.
25+
Alternatively you can deploy a custom model by specifying `custom_pipeline`
26+
2527

2628
</div><div class="h3-box" markdown="1">
2729

30+
31+
2832
### Create Spark-NLP Image
29-
Create a spark-nlp bert image
33+
Create an image with a **Spark-NLP Pretrained Pipeline**
3034
```python
3135
from johnsnowlabs import nlp
32-
nlp.build_image(preloaded_model='bert',image_name='bert_img')
36+
nlp.build_image(pipeline_name='explain_document_lg', pipeline_language= 'fr', image_name='bert_img')
3337
```
3438

35-
Create an image with GPU optimized builds
39+
Create an image with **GPU optimized builds**
3640
```python
3741
from johnsnowlabs import nlp
38-
nlp.build_image(preloaded_model='bert',image_name='bert_gpu_img',hardware_target='gpu')
42+
nlp.build_image(pipeline_name='recognize_entities_bert',hardware_platform='gpu',image_name='bert_gpu_img')
3943
```
4044

4145
</div><div class="h3-box" markdown="1">
4246

4347
### Create Medical NLP Image
44-
To create an image with a **Medical NLP model** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
48+
To create an image with a **Medical NLP Pretrained Pipeline** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
4549
It will be stored in the container and used to pre-download licensed models & dependencies during the build proces.
4650
```python
4751
from johnsnowlabs import nlp
4852
# In this example with authorize via a license.json file, but there are many other ways.
49-
nlp.build_image(preloaded_model='en.med_ner.cancer_genetics.pipeline',image_name='cancer_img', json_license_path='path/to/my/license.json')
53+
nlp.build_image(pipeline_name='ner_bionlp_pipeline',
54+
pipeline_bucket='clinical/models',
55+
json_license_path='path/to/my/license.json',
56+
image_name='cancer_img'
57+
)
5058
```
5159

5260
</div><div class="h3-box" markdown="1">
5361

5462
### Create Licensed Visual NLP Image
55-
To create an image with a **Visual NLP model** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
63+
To create an image with a **Visual NLP Pretrained Pipeline** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
5664
It will be stored in the container and used to pre-download licensed models & dependencies during the build proces.
5765
```python
5866
from johnsnowlabs import nlp
59-
nlp.build_image(preloaded_model='pdf2text',image_name="pdf2text_img",visual=True)
67+
nlp.build_image(pipeline_name='pdf_printed_transformer_extraction',
68+
pipeline_bucket='clinical/ocr',
69+
image_name="pdf2text_img",visual=True)
70+
```
71+
72+
</div><div class="h3-box" markdown="1">
73+
74+
### Create Image with custom pipeline
75+
To create an image with a **custom pipeline** provide a fitted Spark Pipeline object as `custom_pipeline`.
76+
If you use licensed annotators, provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
77+
```python
78+
from johnsnowlabs import nlp
79+
pipe = ... # your custom pipeline which is already fitted
80+
nlp.build_image(custom_pipeline=pipe,image_name="my_custom_img",visual=True)
81+
```
82+
</div><div class="h3-box" markdown="1">
83+
84+
85+
### Create Image with NLU pipeline
86+
To create an image with a **NLU Pipeline** provide a fitted Spark Pipeline object as `custom_pipeline`.
87+
If you use a licensed pipeline, provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
88+
```python
89+
from johnsnowlabs import nlp
90+
pipe = nlp.load('bert')
91+
pipe.predict('Init the pipe')
92+
nlp.build_image(custom_pipeline=pipe.vanilla_transformer_pipe,image_name="bert_img",visual=True)
6093
```
6194

6295
</div><div class="h3-box" markdown="1">
6396

97+
6498
## Serve model image as container
6599
With `nlp.serve_container` you can serve the image you created via `nlp.build_image` as a REST API.
66100
You can head to [http://localhost:8548/docs](http://localhost:8548/docs) to see the fast-api docs for `/predict` and `/predict_batch`.

docs/en/jsl/snowflake_utils.md

Lines changed: 79 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ This will create the following resources:
4444
- compute_pool_name=`tutorial_compute_pool`
4545

4646
You can specify a custom name for any resource by specifying it as a key-word argument.
47+
Additionally, you can specify compute pool parameters with values defined in the [Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/sql/create-compute-pool#required-parameters).
4748

4849
```python
4950
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = nlp.snowflake_common_setup(
@@ -56,24 +57,34 @@ role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = n
5657
stage_name='my_tutorial_stage',
5758
db_name='my_tutorial_db',
5859
warehouse_name='my_tutorial_warehouse',
59-
compute_pool_name='tutorial_compute_pool'
60+
compute_pool_name='tutorial_compute_pool',
61+
62+
# Specify compute pool parameters
63+
compute_pool_min_nodes=1,
64+
compute_pool_max_nodes=1,
65+
compute_pool_instance_family='CPU_X64_XS',
6066
)
6167

6268
```
6369

6470
</div><div class="h3-box" markdown="1">
6571

66-
## Deploy Model as Snowflake Container Services UDF
72+
## Deploy Pretrained Pipeline as Snowflake Container Services UDF
6773

6874
`nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your
6975
Snowflake image repository and finally create a service & udf from the model and test it.
70-
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments.
76+
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passed as arguments.
77+
7178
```python
7279
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
7380
from johnsnowlabs import nlp
81+
7482
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
7583
nlp.deploy_as_snowflake_udf(
76-
nlu_ref='en.de_identify.clinical_pipeline',
84+
pipeline_name='explain_clinical_doc_oncology',
85+
pipeline_bucket='clinical/models',
86+
pipeline_language='en',
87+
7788
snowflake_user='my_snowflake_user',
7889
snowflake_account='my_snowflake_account',
7990
snowflake_password='my_snowflake_password',
@@ -90,13 +101,18 @@ nlp.deploy_as_snowflake_udf(
90101

91102
`nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your
92103
Snowflake image repository and finally create a service & udf from the model and test it.
93-
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments.
104+
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passed as arguments.
105+
94106
```python
95107
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
96108
from johnsnowlabs import nlp
109+
97110
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
98111
nlp.deploy_as_snowflake_udf(
99-
nlu_ref='en.de_identify.clinical_pipeline',
112+
pipeline_name='explain_clinical_doc_oncology',
113+
pipeline_bucket='clinical/models',
114+
pipeline_language='en',
115+
100116
snowflake_user='my_snowflake_user',
101117
snowflake_account='my_snowflake_account',
102118
snowflake_password='my_snowflake_password',
@@ -111,14 +127,18 @@ nlp.deploy_as_snowflake_udf(
111127

112128
```
113129

114-
You can also optionally specify the name of the created service & UDF
130+
You can also optionally specify the name of the created service & UDF
115131

116132
```python
117133
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
118134
from johnsnowlabs import nlp
135+
119136
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
120137
nlp.deploy_as_snowflake_udf(
121-
nlu_ref='en.de_identify.clinical_pipeline',
138+
pipeline_name='explain_clinical_doc_oncology',
139+
pipeline_bucket='clinical/models',
140+
pipeline_language='en',
141+
122142
snowflake_user='my_snowflake_user',
123143
snowflake_account='my_snowflake_account',
124144
snowflake_password='my_snowflake_password',
@@ -138,7 +158,7 @@ You can now use the `en_de_identify_clinical_pipeline_udf()` function within you
138158
when using the created role, database, warehouse, schema.
139159

140160

141-
You can run the following commands in Snowflake to get he status of the service and query the UDF
161+
You can run the following commands in Snowflake to get the status of the service and query the UDF
142162
```sql
143163
-- Set context
144164
USE ROLE test_role;
@@ -163,17 +183,34 @@ CALL SYSTEM$GET_SERVICE_LOGS('en_de_identify_clinical_pipeline_service', '0', 'j
163183
SELECT en_de_identify_clinical_pipeline_udf('The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.');
164184
```
165185

186+
You can also query the UDF with Python like this
187+
188+
```python
189+
import json
190+
def query_udf(client, udf_name, data):
191+
cmd_query_udf = """SELECT {udf_name}('{data}')"""
192+
cur = client.cursor()
193+
cur.execute(cmd_query_udf.format(udf_name=udf_name, data=data))
194+
for row in cur:
195+
data = json.loads(row[0])
196+
print(data)
197+
cur.close()
198+
return data
199+
200+
```
201+
202+
166203
</div><div class="h3-box" markdown="1">
167204

168205
## Streamlit Example with Snowpark services
169206

170-
Once you created an UDF in Snowflake you can access it within Streamlit Apps.
207+
Once you created a UDF in Snowflake you can access it within Streamlit Apps.
171208
Make sure to select the same resources to host your Streamlit app as used for hosting the UDF
172209

173210
This is a small example of a simple streamlit app you can now build:
174211
1. Go to the Streamlit Section in `Projects` within you Snowflake account
175212
3. In the bottom left click on your username and then on switch role and select the role we just created. The default value is `test_role`
176-
3. In the side-bar, click on Streamlit and then on the `+ Streamlit App` button. Specify a Database, Schema and Warehouse. The defaults are `TUTORIAL_DB`, `DATA_SCHEMA`, `TUTORIAL_WAREHOUSE`.
213+
3. In the sidebar, click on Streamlit and then on the `+ Streamlit App` button. Specify a Database, Schema and Warehouse. The defaults are `TUTORIAL_DB`, `DATA_SCHEMA`, `TUTORIAL_WAREHOUSE`.
177214
Copy and paste the following script into your streamlit app and run it
178215
```python
179216
import streamlit as st
@@ -186,4 +223,35 @@ st.write(udf_response.collect()[0].as_dict())
186223

187224
For a more advanced streamlit example, see [here](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/streamlits/advanced_snowflake.py)
188225

226+
227+
## Deploying Custom Pipeline as Snowflake Container Services UDF
228+
To deploy a custom pipeline, simply provide a fitted Spark Pipeline object as `custom_pipeline`.
229+
230+
```python
231+
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
232+
from johnsnowlabs import nlp
233+
234+
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
235+
my_custom_pipeline = ... # your custom pipeline which is already fitted
236+
237+
nlp.deploy_as_snowflake_udf(
238+
custom_pipeline=my_custom_pipeline,
239+
snowflake_user='my_snowflake_user',
240+
snowflake_account='my_snowflake_account',
241+
snowflake_password='my_snowflake_password',
242+
license_path='path/to/my/jsl_license.json',
243+
repo_url=repo_url,
244+
role_name=role_name,
245+
database_name=db_name,
246+
warehouse_name=warehouse_name,
247+
schema_name=schema_name,
248+
compute_pool_name=compute_pool_name,
249+
udf_name='my_udf',
250+
service_name='my_service'
251+
)
252+
```
253+
254+
</div><div class="h3-box" markdown="1">
255+
256+
189257
</div></div>

johnsnowlabs/auto_install/docker/build/app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@
2323
visual_secret = os.environ.get("VISUAL_SECRET", None)
2424

2525
# jars loaded from jsl-home
26-
nlp.start(model_cache_folder="/app/model_cache", aws_access_key=aws_secret_access_key, aws_key_id=aws_access_key_id,
26+
nlp.start(aws_access_key=aws_secret_access_key, aws_key_id=aws_access_key_id,
2727
hc_license=nlp_license, enterprise_nlp_secret=nlp_secret, visual_secret=visual_secret,
2828
visual=True if visual_secret else False, )
2929

30-
model = nlp.load(path="/app/model/served_model", verbose=True)
30+
model = nlp.load(path="/app/model", verbose=True)
3131
if visual_enabled:
3232
# TODO this needs to be set by NLU
3333
model.contains_ocr_components = True

johnsnowlabs/auto_install/docker/build/base_dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,10 @@ RUN chmod +x /usr/local/bin/tini
2626
RUN mkdir /app
2727
RUN mkdir /app/model_cache
2828

29+
# <OPTIONAL MODEL COPY PLACEHOLDER>
30+
2931
# Install Johnsnowlabs libraries
30-
RUN pip install johnsnowlabs fastapi uvicorn python-multipart nbformat
32+
RUN pip install johnsnowlabs fastapi uvicorn python-multipart nbformat packaging
3133
COPY installer.py /app/installer.py
3234
RUN python3 /app/installer.py
3335

johnsnowlabs/auto_install/docker/build/installer.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
import os
2-
32
from johnsnowlabs import settings, nlp
43

54
settings.enforce_versions = False
@@ -11,6 +10,8 @@
1110
aws_secret_access_key = os.environ.get("JOHNSNOWLABS_AWS_SECRET_ACCESS_KEY", None)
1211
HARDWARE_TARGET = os.environ.get("HARDWARE_TARGET", "cpu")
1312
model_ref = os.environ.get("MODEL_TO_LOAD", None)
13+
model_bucket = os.environ.get("MODEL_BUCKET", None)
14+
model_lang = os.environ.get("MODEL_LANGUAGE", 'en')
1415

1516
nlp.install(
1617
browser_login=False,
@@ -27,8 +28,15 @@
2728
hc_license=nlp_license, enterprise_nlp_secret=nlp_secret, visual_secret=visual_secret,
2829
visual=True if visual_secret else False, )
2930
if model_ref:
31+
print(f'Downloading model {model_ref} from bucket {model_bucket} with language {model_lang}')
3032
# Cache model, if not specified user must
3133
# mount a folder to /app/model_cache/ which has a folder named `served_model`
32-
pipe = nlp.load(model_ref)
33-
pipe.predict("init")
34-
pipe.save("/app/model/served_model")
34+
if model_bucket == 'clinical/ocr':
35+
from sparkocr.pretrained import PretrainedPipeline
36+
PretrainedPipeline(model_ref, model_lang, model_bucket).model.save("/app/model")
37+
else:
38+
nlp.PretrainedPipeline(model_ref, model_lang, model_bucket).model.save("/app/model")
39+
else:
40+
print("No model reference provided, skipping model download and validating provided model on disk")
41+
# Validate Model should be stored in /opt/ml/model
42+
nlp.PretrainedPipeline.from_disk("/app/model")

0 commit comments

Comments
 (0)