Skip to content

Commit 7b546b1

Browse files
committed
remove usage of SERVING_LOAD_MODELS and OPTION_MODEL_ID in examples/docs/tests
1 parent 48396fb commit 7b546b1

File tree

7 files changed

+17
-21
lines changed

7 files changed

+17
-21
lines changed

.github/workflows/rolling_batch_integration.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ jobs:
240240
working-directory: tests/integration
241241
run: |
242242
rm -rf models
243-
echo -en "SERVING_LOAD_MODELS=test::MPI=/opt/ml/model\nOPTION_MAX_ROLLING_BATCH_SIZE=2\nOPTION_OUTPUT_FORMATTER=jsonlines\nOPTION_TENSOR_PARALLEL_DEGREE=1\nOPTION_MODEL_ID=gpt2\nOPTION_TASK=text-generation\nOPTION_ROLLING_BATCH=lmi-dist" > docker_env
243+
echo -en "OPTION_MAX_ROLLING_BATCH_SIZE=2\nTENSOR_PARALLEL_DEGREE=1\nHF_MODEL_ID=gpt2\nOPTION_TASK=text-generation\nOPTION_ROLLING_BATCH=lmi-dist" > docker_env
244244
./launch_container.sh deepjavalibrary/djl-serving:$DJLSERVING_DOCKER_TAG nocode lmi
245245
python3 llm/client.py lmi_dist gpt2
246246
docker rm -f $(docker ps -aq)

engines/python/setup/djl_python/tests/test_test_model.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,11 @@ def test_all_code(self):
5252

5353
def test_with_env(self):
5454
envs = {
55-
"OPTION_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
56-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
5755
"OPTION_ROLLING_BATCH": "auto"
5856
}
5957
for key, value in envs.items():
6058
os.environ[key] = value
6159
handler = TestHandler(huggingface)
62-
self.assertEqual(handler.serving_properties["model_id"],
63-
envs["OPTION_MODEL_ID"])
6460
self.assertEqual(handler.serving_properties["rolling_batch"],
6561
envs["OPTION_ROLLING_BATCH"])
6662
inputs = [{
@@ -109,15 +105,11 @@ def test_all_code_chat(self):
109105

110106
def test_with_env_chat(self):
111107
envs = {
112-
"OPTION_MODEL_ID": "TheBloke/Llama-2-7B-Chat-fp16",
113-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
114108
"OPTION_ROLLING_BATCH": "auto"
115109
}
116110
for key, value in envs.items():
117111
os.environ[key] = value
118112
handler = TestHandler(huggingface)
119-
self.assertEqual(handler.serving_properties["model_id"],
120-
envs["OPTION_MODEL_ID"])
121113
self.assertEqual(handler.serving_properties["rolling_batch"],
122114
envs["OPTION_ROLLING_BATCH"])
123115
inputs = [{

serving/docs/lmi/deployment_guide/configurations.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -125,17 +125,20 @@ You can find these configurations in the respective [user guides](../user_guides
125125

126126
All LMI Configuration keys available in the `serving.properties` format can be specified as environment variables.
127127

128-
The translation for `engine` is unique. The configuration `engine=<engine>` is translated to `SERVING_LOAD_MODELS=test::<engine>=/opt/ml/model`.
129-
For example:
128+
The property `option.model_id` is unique. It is translated to `HF_MODEL_ID`.
130129

131-
* `engine=Python` is translated to environment variable `SERVING_LOAD_MODELS=test::Python=/opt/ml/model`
132-
* `engine=MPI` is translated to environment variable `SERVING_LOAD_MODELS=test::MPI=/opt/ml/model`
130+
The property `engine` is translated to `OPTION_ENGINE`.
131+
By default, LMI will use the Python engine. You can use `OPTION_ENGINE=Python` to explicitly set the engine.
132+
To use the MPI engine, you should also provide `OPTION_MPI_MODE=true`.
133+
In general, we recommend that you do not specify engine or mpi configurations through environment variables.
134+
LMI will infer the correct engine and operating mode based on `option.rolling_batch` if provided.
135+
If `option.rolling_batch` is not provided, LMI will infer the recommended backend and set the engine configuration accordingly.
133136

134137
Configuration keys that start with `option.` can be specified as environment variables using the `OPTION_` prefix.
135138
The configuration `option.<property>` is translated to environment variable `OPTION_<PROPERTY>`. For example:
136139

137-
* `option.model_id` is translated to environment variable `OPTION_MODEL_ID`
138-
* `option.tensor_parallel_degree` is translated to environment variable `OPTION_TENSOR_PARALLEL_DEGREE`
140+
* `option.rolling_batch` is translated to environment variable `OPTION_ROLLING_BATCH`
141+
139142

140143
Configuration keys that do not start with option can be specified as environment variables using the `SERVING_` prefix.
141144
The configuration `<property>` is translated to environment variable `SERVING_<PROPERTY>`. For example:

serving/docs/lmi/deployment_guide/deploying-your-endpoint.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,8 @@ The following options may be added to the `ModelDataSource` field to support unc
176176
This mechanism is useful when deploying SageMaker endpoints with network isolation.
177177
Model artifacts will be downloaded by SageMaker and mounted to the container rather than being downloaded by the container at runtime.
178178

179-
If you use this mechanism to deploy the container, you should set `option.model_id=/opt/ml/model` in serving.properties, or `OPTION_MODEL_ID=/opt/ml/model` in environment variables depending on which configuration style you are using.
179+
If you use this mechanism to deploy the container, you do not need to specify the `option.model_id` or `HF_MODEL_ID` config.
180+
LMI will load the model artifacts from the model directory by default, which is where SageMaker downloads and mounts the model artifacts from S3.
180181

181182
Follow this link for a detailed overview of this option: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html
182183

serving/docs/lmi/deployment_guide/testing-custom-script.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ from djl_python import huggingface
4848
from djl_python.test_model import TestHandler
4949

5050
envs = {
51-
"OPTION_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
51+
"HF_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
5252
"OPTION_MPI_MODE": "true",
5353
"OPTION_ROLLING_BATCH": "lmi-dist",
5454
"OPTION_TENSOR_PARALLEL_DEGREE": 4

serving/docs/lmi/tutorials/trtllm_aot_tutorial.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.27.0-te
5050
These below configurations helps you configure the inference optimizations parameters. You can check all the configurations of TensorRT-LLM LMI handler [in our docs](../user_guides/trt_llm_user_guide.md#advanced-tensorrt-llm-configurations).
5151

5252
```
53-
OPTION_MODEL_ID={{s3url}}
53+
HF_MODEL_ID={{s3url}}
5454
OPTION_TENSOR_PARALLEL_DEGREE=8
5555
OPTION_MAX_ROLLING_BATCH_SIZE=128
5656
OPTION_DTYPE=fp16
@@ -87,7 +87,7 @@ In the below example, the model artifacts will be saved to `$MODEL_REPO_DIR` cre
8787
docker run --runtime=nvidia --gpus all --shm-size 12gb \
8888
-v $MODEL_REPO_DIR:/tmp/trtllm \
8989
-p 8080:8080 \
90-
-e OPTION_MODEL_ID=$OPTION_MODEL_ID \
90+
-e HF_MODEL_ID=$HF_MODEL_ID \
9191
-e OPTION_TENSOR_PARALLEL_DEGREE=$OPTION_TENSOR_PARALLEL_DEGREE \
9292
-e OPTION_MAX_ROLLING_BATCH_SIZE=$OPTION_MAX_ROLLING_BATCH_SIZE \
9393
-e OPTION_DTYPE=$OPTION_DTYPE \
@@ -115,7 +115,7 @@ aws s3 cp $MODEL_REPO_DIR s3://YOUR_S3_FOLDER_NAME/ --recursive
115115
**Note:** After uploading model artifacts to s3, you can just update the model_id(env var or in `serving.properties`) to the newly created s3 url with compiled model artifacts and use the same rest of the environment variables or `serving.properties` when deploying on SageMaker. Here, you can check the [tutorial](https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/trtllm_rollingbatch_deploy_llama_13b.ipynb) on how to run inference using TensorRT-LLM DLC. Below snippet shows example updated model_id.
116116

117117
```
118-
OPTION_MODEL_ID=s3://YOUR_S3_FOLDER_NAME
118+
HF_MODEL_ID=s3://YOUR_S3_FOLDER_NAME
119119
OPTION_TENSOR_PARALLEL_DEGREE=8
120120
OPTION_MAX_ROLLING_BATCH_SIZE=128
121121
OPTION_DTYPE=fp16

serving/docs/lmi/tutorials/trtllm_manual_convert_tutorial.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ Finally, you can use one of the following configuration to load your model on Sa
254254

255255
### 1. Environment variables:
256256
```
257-
OPTION_MODEL_ID=s3://lmi-llm/trtllm/0.5.0/baichuan-13b-tp2/
257+
HF_MODEL_ID=s3://lmi-llm/trtllm/0.5.0/baichuan-13b-tp2/
258258
OPTION_TENSOR_PARALLEL_DEGREE=2
259259
OPTION_MAX_ROLLING_BATCH_SIZE=64
260260
```

0 commit comments

Comments
 (0)