Skip to content

Commit e6767cc

Browse files
committed
remove usage of SERVING_LOAD_MODELS and OPTION_MODEL_ID in examples/docs/tests
1 parent b178dc9 commit e6767cc

File tree

7 files changed

+21
-32
lines changed

7 files changed

+21
-32
lines changed

.github/workflows/llm_integration.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -562,7 +562,7 @@ jobs:
562562
working-directory: tests/integration
563563
run: |
564564
rm -rf models
565-
echo -en "SERVING_LOAD_MODELS=test::MPI=/opt/ml/model\nOPTION_MAX_ROLLING_BATCH_SIZE=2\nOPTION_OUTPUT_FORMATTER=jsonlines\nOPTION_TENSOR_PARALLEL_DEGREE=1\nOPTION_MODEL_ID=gpt2\nOPTION_TASK=text-generation\nOPTION_ROLLING_BATCH=lmi-dist" > docker_env
565+
echo -en "OPTION_MAX_ROLLING_BATCH_SIZE=2\nOPTION_OUTPUT_FORMATTER=jsonlines\nTENSOR_PARALLEL_DEGREE=1\nHF_MODEL_ID=gpt2\nOPTION_TASK=text-generation\nOPTION_ROLLING_BATCH=lmi-dist" > docker_env
566566
./launch_container.sh deepjavalibrary/djl-serving:$DJLSERVING_DOCKER_TAG nocode lmi
567567
python3 llm/client.py lmi_dist gpt2
568568
docker rm -f $(docker ps -aq)

engines/python/setup/djl_python/tests/test_test_model.py

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -61,17 +61,14 @@ def test_all_code(self):
6161

6262
def test_with_env(self):
6363
envs = {
64-
"OPTION_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
65-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
64+
"HF_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
6665
"OPTION_ROLLING_BATCH": "auto",
6766
"OPTION_TGI_COMPAT": "true"
6867
}
6968
for key, value in envs.items():
7069
os.environ[key] = value
7170
huggingface.get_rolling_batch_class_from_str = override_rolling_batch
7271
handler = TestHandler(huggingface)
73-
self.assertEqual(handler.serving_properties["model_id"],
74-
envs["OPTION_MODEL_ID"])
7572
self.assertEqual(handler.serving_properties["rolling_batch"],
7673
envs["OPTION_ROLLING_BATCH"])
7774
self.assertEqual(handler.serving_properties["tgi_compat"],
@@ -100,17 +97,14 @@ def test_with_env(self):
10097

10198
def test_with_tgi_compat_env(self):
10299
envs = {
103-
"OPTION_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
104-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
100+
"HF_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
105101
"OPTION_ROLLING_BATCH": "auto",
106102
"OPTION_TGI_COMPAT": "true"
107103
}
108104
for key, value in envs.items():
109105
os.environ[key] = value
110106
huggingface.get_rolling_batch_class_from_str = override_rolling_batch
111107
handler = TestHandler(huggingface)
112-
self.assertEqual(handler.serving_properties["model_id"],
113-
envs["OPTION_MODEL_ID"])
114108
self.assertEqual(handler.serving_properties["rolling_batch"],
115109
envs["OPTION_ROLLING_BATCH"])
116110
self.assertEqual(handler.serving_properties["tgi_compat"],
@@ -161,17 +155,11 @@ def test_all_code_chat(self):
161155
self.assertEqual(len(result), len(inputs))
162156

163157
def test_with_env_chat(self):
164-
envs = {
165-
"OPTION_MODEL_ID": "TheBloke/Llama-2-7B-Chat-fp16",
166-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
167-
"OPTION_ROLLING_BATCH": "auto"
168-
}
158+
envs = {"OPTION_ROLLING_BATCH": "auto"}
169159
for key, value in envs.items():
170160
os.environ[key] = value
171161
huggingface.get_rolling_batch_class_from_str = override_rolling_batch
172162
handler = TestHandler(huggingface)
173-
self.assertEqual(handler.serving_properties["model_id"],
174-
envs["OPTION_MODEL_ID"])
175163
self.assertEqual(handler.serving_properties["rolling_batch"],
176164
envs["OPTION_ROLLING_BATCH"])
177165
inputs = [{
@@ -248,8 +236,7 @@ def test_exception_handling(self):
248236
@unittest.skip
249237
def test_profiling(self, logging_method):
250238
envs = {
251-
"OPTION_MODEL_ID": "TheBloke/Llama-2-7B-Chat-fp16",
252-
"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
239+
"HF_MODEL_ID": "TheBloke/Llama-2-7B-Chat-fp16",
253240
"OPTION_ROLLING_BATCH": "auto",
254241
"DJL_PYTHON_PROFILING": "true",
255242
"DJL_PYTHON_PROFILING_TOP_OBJ": "60"
@@ -259,8 +246,6 @@ def test_profiling(self, logging_method):
259246
os.environ[key] = value
260247
huggingface.get_rolling_batch_class_from_str = override_rolling_batch
261248
handler = TestHandler(huggingface)
262-
self.assertEqual(handler.serving_properties["model_id"],
263-
envs["OPTION_MODEL_ID"])
264249
self.assertEqual(handler.serving_properties["rolling_batch"],
265250
envs["OPTION_ROLLING_BATCH"])
266251
inputs = [{

serving/docs/lmi/deployment_guide/configurations.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -125,17 +125,20 @@ You can find these configurations in the respective [user guides](../user_guides
125125

126126
All LMI Configuration keys available in the `serving.properties` format can be specified as environment variables.
127127

128-
The translation for `engine` is unique. The configuration `engine=<engine>` is translated to `SERVING_LOAD_MODELS=test::<engine>=/opt/ml/model`.
129-
For example:
128+
The property `option.model_id` is unique. It is translated to `HF_MODEL_ID`.
130129

131-
* `engine=Python` is translated to environment variable `SERVING_LOAD_MODELS=test::Python=/opt/ml/model`
132-
* `engine=MPI` is translated to environment variable `SERVING_LOAD_MODELS=test::MPI=/opt/ml/model`
130+
The property `engine` is translated to `OPTION_ENGINE`.
131+
By default, LMI will use the Python engine. You can use `OPTION_ENGINE=Python` to explicitly set the engine.
132+
To use the MPI engine, you should also provide `OPTION_MPI_MODE=true`.
133+
In general, we recommend that you do not specify engine or mpi configurations through environment variables.
134+
LMI will infer the correct engine and operating mode based on `option.rolling_batch` if provided.
135+
If `option.rolling_batch` is not provided, LMI will infer the recommended backend and set the engine configuration accordingly.
133136

134137
Configuration keys that start with `option.` can be specified as environment variables using the `OPTION_` prefix.
135138
The configuration `option.<property>` is translated to environment variable `OPTION_<PROPERTY>`. For example:
136139

137-
* `option.model_id` is translated to environment variable `OPTION_MODEL_ID`
138-
* `option.tensor_parallel_degree` is translated to environment variable `OPTION_TENSOR_PARALLEL_DEGREE`
140+
* `option.rolling_batch` is translated to environment variable `OPTION_ROLLING_BATCH`
141+
139142

140143
Configuration keys that do not start with option can be specified as environment variables using the `SERVING_` prefix.
141144
The configuration `<property>` is translated to environment variable `SERVING_<PROPERTY>`. For example:

serving/docs/lmi/deployment_guide/deploying-your-endpoint.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,8 @@ The following options may be added to the `ModelDataSource` field to support unc
176176
This mechanism is useful when deploying SageMaker endpoints with network isolation.
177177
Model artifacts will be downloaded by SageMaker and mounted to the container rather than being downloaded by the container at runtime.
178178

179-
If you use this mechanism to deploy the container, you should set `option.model_id=/opt/ml/model` in serving.properties, or `OPTION_MODEL_ID=/opt/ml/model` in environment variables depending on which configuration style you are using.
179+
If you use this mechanism to deploy the container, you do not need to specify the `option.model_id` or `HF_MODEL_ID` config.
180+
LMI will load the model artifacts from the model directory by default, which is where SageMaker downloads and mounts the model artifacts from S3.
180181

181182
Follow this link for a detailed overview of this option: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html
182183

serving/docs/lmi/deployment_guide/testing-custom-script.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ from djl_python import huggingface
4848
from djl_python.test_model import TestHandler
4949

5050
envs = {
51-
"OPTION_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
51+
"HF_MODEL_ID": "NousResearch/Nous-Hermes-Llama2-13b",
5252
"OPTION_MPI_MODE": "true",
5353
"OPTION_ROLLING_BATCH": "lmi-dist",
5454
"OPTION_TENSOR_PARALLEL_DEGREE": 4

serving/docs/lmi/tutorials/trtllm_aot_tutorial.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.27.0-te
5050
These below configurations helps you configure the inference optimizations parameters. You can check all the configurations of TensorRT-LLM LMI handler [in our docs](../user_guides/trt_llm_user_guide.md#advanced-tensorrt-llm-configurations).
5151

5252
```
53-
OPTION_MODEL_ID={{s3url}}
53+
HF_MODEL_ID={{s3url}}
5454
OPTION_TENSOR_PARALLEL_DEGREE=8
5555
OPTION_MAX_ROLLING_BATCH_SIZE=128
5656
OPTION_DTYPE=fp16
@@ -87,7 +87,7 @@ In the below example, the model artifacts will be saved to `$MODEL_REPO_DIR` cre
8787
docker run --runtime=nvidia --gpus all --shm-size 12gb \
8888
-v $MODEL_REPO_DIR:/tmp/trtllm \
8989
-p 8080:8080 \
90-
-e OPTION_MODEL_ID=$OPTION_MODEL_ID \
90+
-e HF_MODEL_ID=$HF_MODEL_ID \
9191
-e OPTION_TENSOR_PARALLEL_DEGREE=$OPTION_TENSOR_PARALLEL_DEGREE \
9292
-e OPTION_MAX_ROLLING_BATCH_SIZE=$OPTION_MAX_ROLLING_BATCH_SIZE \
9393
-e OPTION_DTYPE=$OPTION_DTYPE \
@@ -115,7 +115,7 @@ aws s3 cp $MODEL_REPO_DIR s3://YOUR_S3_FOLDER_NAME/ --recursive
115115
**Note:** After uploading model artifacts to s3, you can just update the model_id(env var or in `serving.properties`) to the newly created s3 url with compiled model artifacts and use the same rest of the environment variables or `serving.properties` when deploying on SageMaker. Here, you can check the [tutorial](https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/trtllm_rollingbatch_deploy_llama_13b.ipynb) on how to run inference using TensorRT-LLM DLC. Below snippet shows example updated model_id.
116116

117117
```
118-
OPTION_MODEL_ID=s3://YOUR_S3_FOLDER_NAME
118+
HF_MODEL_ID=s3://YOUR_S3_FOLDER_NAME
119119
OPTION_TENSOR_PARALLEL_DEGREE=8
120120
OPTION_MAX_ROLLING_BATCH_SIZE=128
121121
OPTION_DTYPE=fp16

serving/docs/lmi/tutorials/trtllm_manual_convert_tutorial.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ Finally, you can use one of the following configuration to load your model on Sa
254254

255255
### 1. Environment variables:
256256
```
257-
OPTION_MODEL_ID=s3://lmi-llm/trtllm/0.5.0/baichuan-13b-tp2/
257+
HF_MODEL_ID=s3://lmi-llm/trtllm/0.5.0/baichuan-13b-tp2/
258258
OPTION_TENSOR_PARALLEL_DEGREE=2
259259
OPTION_MAX_ROLLING_BATCH_SIZE=64
260260
```

0 commit comments

Comments
 (0)