You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -176,7 +176,8 @@ The following options may be added to the `ModelDataSource` field to support unc
176
176
This mechanism is useful when deploying SageMaker endpoints with network isolation.
177
177
Model artifacts will be downloaded by SageMaker and mounted to the container rather than being downloaded by the container at runtime.
178
178
179
-
If you use this mechanism to deploy the container, you should set `option.model_id=/opt/ml/model` in serving.properties, or `OPTION_MODEL_ID=/opt/ml/model` in environment variables depending on which configuration style you are using.
179
+
If you use this mechanism to deploy the container, you do not need to specify the `option.model_id` or `HF_MODEL_ID` config.
180
+
LMI will load the model artifacts from the model directory by default, which is where SageMaker downloads and mounts the model artifacts from S3.
180
181
181
182
Follow this link for a detailed overview of this option: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html
These below configurations helps you configure the inference optimizations parameters. You can check all the configurations of TensorRT-LLM LMI handler [in our docs](../user_guides/trt_llm_user_guide.md#advanced-tensorrt-llm-configurations).
51
51
52
52
```
53
-
OPTION_MODEL_ID={{s3url}}
53
+
HF_MODEL_ID={{s3url}}
54
54
OPTION_TENSOR_PARALLEL_DEGREE=8
55
55
OPTION_MAX_ROLLING_BATCH_SIZE=128
56
56
OPTION_DTYPE=fp16
@@ -87,7 +87,7 @@ In the below example, the model artifacts will be saved to `$MODEL_REPO_DIR` cre
87
87
docker run --runtime=nvidia --gpus all --shm-size 12gb \
**Note:** After uploading model artifacts to s3, you can just update the model_id(env var or in `serving.properties`) to the newly created s3 url with compiled model artifacts and use the same rest of the environment variables or `serving.properties` when deploying on SageMaker. Here, you can check the [tutorial](https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/trtllm_rollingbatch_deploy_llama_13b.ipynb) on how to run inference using TensorRT-LLM DLC. Below snippet shows example updated model_id.
0 commit comments