JohnSnowLabs
diff --git a/‎docs/en/jsl/docker_utils.md
Lines changed: 44 additions & 10 deletions b/‎docs/en/jsl/docker_utils.md
Lines changed: 44 additions & 10 deletions
diff --git a/‎docs/en/jsl/snowflake_utils.md
Lines changed: 79 additions & 11 deletions b/‎docs/en/jsl/snowflake_utils.md
Lines changed: 79 additions & 11 deletions
diff --git a/‎johnsnowlabs/auto_install/docker/build/app.py
Lines changed: 2 additions & 2 deletions b/‎johnsnowlabs/auto_install/docker/build/app.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎johnsnowlabs/auto_install/docker/build/base_dockerfile
Lines changed: 3 additions & 1 deletion b/‎johnsnowlabs/auto_install/docker/build/base_dockerfile
Lines changed: 3 additions & 1 deletion
diff --git a/‎johnsnowlabs/auto_install/docker/build/installer.py
Lines changed: 12 additions & 4 deletions b/‎johnsnowlabs/auto_install/docker/build/installer.py
Lines changed: 12 additions & 4 deletions
@@ -20,47 +20,81 @@ This enables you to package any johnsnowlabs model into a docker image and serve
 
 ## Image creation
 With `nlp.build_image` you can create a docker image with any johnsnowlabs model pre-packed and ready to serve.
-You just need to specify the [nlu reference to the model](https://nlp.johnsnowlabs.com/docs/en/jsl/namespace) and the **name of the output image**
-Additionally, you can set the `hardware_target` to `cpu`, `gpu`, `apple_silicon` or `aarch` to package with Jar's optimized for specific hardware.
+You can set the `hardware_platform` to `cpu`, `gpu`, `apple_silicon` or `aarch` to package with Jar's optimized for specific hardware.
+You can either specify `pipeline_name`, `pipeline_language` and `pipeline_bucket` to create an image with a specific pipeline.
+Alternatively you can deploy a custom model by specifying `custom_pipeline`
+
 
 </div><div class="h3-box" markdown="1">
 
+
+
 ### Create Spark-NLP Image 
-Create a spark-nlp bert image
+Create an image with a **Spark-NLP Pretrained Pipeline**
 ```python
 from johnsnowlabs import nlp
-nlp.build_image(preloaded_model='bert',image_name='bert_img')
+nlp.build_image(pipeline_name='explain_document_lg', pipeline_language= 'fr', image_name='bert_img')
 ```
 
-Create an image with GPU optimized builds
+Create an image with **GPU optimized builds**
 ```python
 from johnsnowlabs import nlp
-nlp.build_image(preloaded_model='bert',image_name='bert_gpu_img',hardware_target='gpu')
+nlp.build_image(pipeline_name='recognize_entities_bert',hardware_platform='gpu',image_name='bert_gpu_img')
 ```
 
 </div><div class="h3-box" markdown="1">
 
 ### Create Medical NLP Image 
-To create an image with a **Medical NLP model** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
+To create an image with a **Medical NLP Pretrained Pipeline** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
 It will be stored in the container and used to pre-download licensed models & dependencies during the build proces.
 ```python
 from johnsnowlabs import nlp
 # In this example with authorize via a license.json file, but there are many other ways.
-nlp.build_image(preloaded_model='en.med_ner.cancer_genetics.pipeline',image_name='cancer_img', json_license_path='path/to/my/license.json')
+nlp.build_image(pipeline_name='ner_bionlp_pipeline', 
+                pipeline_bucket='clinical/models',
+                json_license_path='path/to/my/license.json',
+                image_name='cancer_img'
+                )
 ```
 
 </div><div class="h3-box" markdown="1">
 
 ### Create Licensed Visual NLP Image
-To create an image with a **Visual NLP model** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
+To create an image with a **Visual NLP Pretrained Pipeline** you must provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
 It will be stored in the container and used to pre-download licensed models & dependencies during the build proces.
 ```python
 from johnsnowlabs import nlp
-nlp.build_image(preloaded_model='pdf2text',image_name="pdf2text_img",visual=True)
+nlp.build_image(pipeline_name='pdf_printed_transformer_extraction',
+                pipeline_bucket='clinical/ocr',
+                image_name="pdf2text_img",visual=True)
+```
+
+</div><div class="h3-box" markdown="1">
+
+### Create Image with custom pipeline
+To create an image with a **custom pipeline** provide a fitted Spark Pipeline object as `custom_pipeline`.
+If you use licensed annotators, provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
+```python
+from johnsnowlabs import nlp
+pipe = ... # your custom pipeline which is already fitted
+nlp.build_image(custom_pipeline=pipe,image_name="my_custom_img",visual=True)
+```
+</div><div class="h3-box" markdown="1">
+
+
+### Create Image with NLU pipeline
+To create an image with a **NLU Pipeline** provide a fitted Spark Pipeline object as `custom_pipeline`.
+If you use a licensed pipeline, provide a license in one of the ways described by the [Authorization Flows Overview](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#authorization-flows-overview).
+```python
+from johnsnowlabs import nlp
+pipe = nlp.load('bert')
+pipe.predict('Init the pipe')
+nlp.build_image(custom_pipeline=pipe.vanilla_transformer_pipe,image_name="bert_img",visual=True)
 ```
 
 </div><div class="h3-box" markdown="1">
 
+
 ## Serve model image as container
 With `nlp.serve_container` you can serve the image you created via `nlp.build_image` as a REST API.
 You can head to [http://localhost:8548/docs](http://localhost:8548/docs) to see the fast-api docs for `/predict` and `/predict_batch`.
 
@@ -44,6 +44,7 @@ This will create the following resources:
 - compute_pool_name=`tutorial_compute_pool`
 
 You can specify a custom name for any resource by specifying it as a key-word argument. 
+Additionally, you can specify compute pool parameters with values defined in the [Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/sql/create-compute-pool#required-parameters).
 
 ```python
 role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = nlp.snowflake_common_setup(
@@ -56,24 +57,34 @@ role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = n
     stage_name='my_tutorial_stage',
     db_name='my_tutorial_db',
     warehouse_name='my_tutorial_warehouse',
-    compute_pool_name='tutorial_compute_pool'
+    compute_pool_name='tutorial_compute_pool',
+    
+    # Specify compute pool parameters
+    compute_pool_min_nodes=1,
+    compute_pool_max_nodes=1,
+    compute_pool_instance_family='CPU_X64_XS',
 )
 
 ```
 
 </div><div class="h3-box" markdown="1">
 
-## Deploy Model as Snowflake Container Services UDF
+## Deploy Pretrained Pipeline as Snowflake Container Services UDF
 
 `nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your 
 Snowflake image repository and finally create a service & udf from the model and test it.
-Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments. 
+Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passed as arguments.
+
 ```python
 # Either run `nlp.snowflake_common_setup` or manually create&specify these resources
 from johnsnowlabs import nlp
+
 role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
 nlp.deploy_as_snowflake_udf(
-    nlu_ref='en.de_identify.clinical_pipeline',
+    pipeline_name='explain_clinical_doc_oncology',
+    pipeline_bucket='clinical/models',
+    pipeline_language='en',
+    
     snowflake_user='my_snowflake_user',
     snowflake_account='my_snowflake_account',
     snowflake_password='my_snowflake_password',
@@ -90,13 +101,18 @@ nlp.deploy_as_snowflake_udf(
 
 `nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your
 Snowflake image repository and finally create a service & udf from the model and test it.
-Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments.
+Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passed as arguments.
+
 ```python
 # Either run `nlp.snowflake_common_setup` or manually create&specify these resources
 from johnsnowlabs import nlp
+
 role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
 nlp.deploy_as_snowflake_udf(
-    nlu_ref='en.de_identify.clinical_pipeline',
+    pipeline_name='explain_clinical_doc_oncology',
+    pipeline_bucket='clinical/models',
+    pipeline_language='en',
+    
     snowflake_user='my_snowflake_user',
     snowflake_account='my_snowflake_account',
     snowflake_password='my_snowflake_password',
@@ -111,14 +127,18 @@ nlp.deploy_as_snowflake_udf(
 
 ```
 
-You can also optionally specify the name of the created service & UDF 
+You can also optionally specify the name of the created service & UDF
 
 ```python
 # Either run `nlp.snowflake_common_setup` or manually create&specify these resources
 from johnsnowlabs import nlp
+
 role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
 nlp.deploy_as_snowflake_udf(
-    nlu_ref='en.de_identify.clinical_pipeline',
+    pipeline_name='explain_clinical_doc_oncology',
+    pipeline_bucket='clinical/models',
+    pipeline_language='en',
+    
     snowflake_user='my_snowflake_user',
     snowflake_account='my_snowflake_account',
     snowflake_password='my_snowflake_password',
@@ -138,7 +158,7 @@ You can now use the `en_de_identify_clinical_pipeline_udf()` function within you
 when using the created role, database, warehouse, schema.
 
 
-You can run the following commands in Snowflake to get he status of the service and query the UDF 
+You can run the following commands in Snowflake to get the status of the service and query the UDF 
 ```sql
 -- Set context 
 USE ROLE test_role;
@@ -163,17 +183,34 @@ CALL SYSTEM$GET_SERVICE_LOGS('en_de_identify_clinical_pipeline_service', '0', 'j
 SELECT en_de_identify_clinical_pipeline_udf('The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.');
 ```
 
+You can also query the UDF with Python like this 
+
+```python
+import json 
+def query_udf(client, udf_name, data):
+    cmd_query_udf = """SELECT {udf_name}('{data}')"""
+    cur = client.cursor()
+    cur.execute(cmd_query_udf.format(udf_name=udf_name, data=data))
+    for row in cur:
+        data = json.loads(row[0])
+        print(data)
+    cur.close()
+    return data
+
+```
+
+
 </div><div class="h3-box" markdown="1">
 
 ## Streamlit Example with Snowpark services
 
-Once you created an UDF in Snowflake you can access it within Streamlit Apps. 
+Once you created a UDF in Snowflake you can access it within Streamlit Apps. 
 Make sure to select the same resources to host your Streamlit app as used for hosting the UDF
 
 This is a small example of a simple streamlit app you can now build: 
 1. Go to the Streamlit Section in `Projects` within you Snowflake account
 3. In the bottom left click on your username and then on switch role and select the role we just created. The default value is `test_role`
-3. In the side-bar, click on Streamlit and then on the `+ Streamlit App` button. Specify a Database, Schema and Warehouse. The defaults are `TUTORIAL_DB`, `DATA_SCHEMA`, `TUTORIAL_WAREHOUSE`.
+3. In the sidebar, click on Streamlit and then on the `+ Streamlit App` button. Specify a Database, Schema and Warehouse. The defaults are `TUTORIAL_DB`, `DATA_SCHEMA`, `TUTORIAL_WAREHOUSE`.
 Copy and paste the following script into your streamlit app and run it 
 ```python
 import streamlit as st
@@ -186,4 +223,35 @@ st.write(udf_response.collect()[0].as_dict())
 
 For a more advanced streamlit example, see [here](https://github.com/JohnSnowLabs/johnsnowlabs/blob/main/streamlits/advanced_snowflake.py)
 
+
+## Deploying Custom Pipeline as Snowflake Container Services UDF
+To deploy a custom pipeline, simply provide a fitted Spark Pipeline object as `custom_pipeline`.
+
+```python
+# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
+from johnsnowlabs import nlp
+
+role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
+my_custom_pipeline = ... # your custom pipeline which is already fitted
+
+nlp.deploy_as_snowflake_udf(
+    custom_pipeline=my_custom_pipeline,
+    snowflake_user='my_snowflake_user',
+    snowflake_account='my_snowflake_account',
+    snowflake_password='my_snowflake_password',
+    license_path='path/to/my/jsl_license.json',
+    repo_url=repo_url,
+    role_name=role_name,
+    database_name=db_name,
+    warehouse_name=warehouse_name,
+    schema_name=schema_name,
+    compute_pool_name=compute_pool_name,
+    udf_name='my_udf',
+    service_name='my_service'
+)
+```
+
+</div><div class="h3-box" markdown="1">
+
+
 </div></div>
@@ -23,11 +23,11 @@
 visual_secret = os.environ.get("VISUAL_SECRET", None)
 
 # jars loaded from jsl-home
-nlp.start(model_cache_folder="/app/model_cache", aws_access_key=aws_secret_access_key, aws_key_id=aws_access_key_id,
+nlp.start(aws_access_key=aws_secret_access_key, aws_key_id=aws_access_key_id,
           hc_license=nlp_license, enterprise_nlp_secret=nlp_secret, visual_secret=visual_secret,
           visual=True if visual_secret else False, )
 
-model = nlp.load(path="/app/model/served_model", verbose=True)
+model = nlp.load(path="/app/model", verbose=True)
 if visual_enabled:
     # TODO this needs to be set by NLU
     model.contains_ocr_components = True
 
@@ -26,8 +26,10 @@ RUN chmod +x /usr/local/bin/tini
 RUN mkdir /app
 RUN mkdir /app/model_cache
 
+# <OPTIONAL MODEL COPY PLACEHOLDER>
+
 # Install Johnsnowlabs libraries
-RUN pip install johnsnowlabs  fastapi uvicorn python-multipart nbformat
+RUN pip install johnsnowlabs  fastapi uvicorn python-multipart nbformat packaging
 COPY installer.py /app/installer.py
 RUN python3 /app/installer.py
 
 
@@ -1,5 +1,4 @@
 import os
-
 from johnsnowlabs import settings, nlp
 
 settings.enforce_versions = False
@@ -11,6 +10,8 @@
 aws_secret_access_key = os.environ.get("JOHNSNOWLABS_AWS_SECRET_ACCESS_KEY", None)
 HARDWARE_TARGET = os.environ.get("HARDWARE_TARGET", "cpu")
 model_ref = os.environ.get("MODEL_TO_LOAD", None)
+model_bucket = os.environ.get("MODEL_BUCKET", None)
+model_lang = os.environ.get("MODEL_LANGUAGE", 'en')
 
 nlp.install(
     browser_login=False,
@@ -27,8 +28,15 @@
           hc_license=nlp_license, enterprise_nlp_secret=nlp_secret, visual_secret=visual_secret,
           visual=True if visual_secret else False, )
 if model_ref:
+    print(f'Downloading model {model_ref} from bucket {model_bucket} with language {model_lang}')
     # Cache model, if not specified user must
     # mount a folder to /app/model_cache/ which has a folder named `served_model`
-    pipe = nlp.load(model_ref)
-    pipe.predict("init")
-    pipe.save("/app/model/served_model")
+    if model_bucket == 'clinical/ocr':
+        from sparkocr.pretrained import PretrainedPipeline
+        PretrainedPipeline(model_ref, model_lang, model_bucket).model.save("/app/model")
+    else:
+        nlp.PretrainedPipeline(model_ref, model_lang, model_bucket).model.save("/app/model")
+else:
+    print("No model reference provided, skipping model download and validating provided model on disk")
+    # Validate Model should be stored in /opt/ml/model
+    nlp.PretrainedPipeline.from_disk("/app/model")