-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I am trying to run a benchmark on GPT-J model using the command below:
mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev --model=gptj-99 --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --docker --quiet --test_query_count=50 --rerun --env.MLC_NVIDIA_TP_SIZE=
But I get an error when pip is installing nvidia-ammo inside docker. Here is full run log:
[2025-10-09 13:32:25,196 module.py:574 INFO] - * mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev
[2025-10-09 13:32:25,212 module.py:574 INFO] - * mlcr get,mlcommons,inference,src
[2025-10-09 13:32:25,213 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-src_0775857b/mlc-cached-state.json
[2025-10-09 13:32:25,224 module.py:574 INFO] - * mlcr get,mlperf,inference,results,dir,_version.r5.0-dev
[2025-10-09 13:32:25,225 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-results-dir_65ceae20/mlc-cached-state.json
[2025-10-09 13:32:25,237 module.py:574 INFO] - * mlcr install,pip-package,for-mlc-python,_package.tabulate
[2025-10-09 13:32:25,239 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/install-pip-package-for-mlc-python_4f91e4e5/mlc-cached-state.json
[2025-10-09 13:32:25,251 module.py:574 INFO] - * mlcr get,mlperf,inference,utils
[2025-10-09 13:32:25,273 module.py:574 INFO] - * mlcr get,mlperf,inference,src
[2025-10-09 13:32:25,274 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-src_0775857b/mlc-cached-state.json
[2025-10-09 13:32:25,278 module.py:5412 INFO] - ! call "postprocess" from /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/admin/MLC/repos/local/cache/get-git-repo_inference-src_892d7667/inference
[2025-10-09 13:32:25,285 customize.py:273 INFO] -
Running loadgen scenario: Offline and mode: performance
[2025-10-09 13:32:25,440 module.py:574 INFO] - * mlcr get,mlperf,inference,submission,dir,local,_version.r5.0-dev
[2025-10-09 13:32:25,441 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-submission-dir_0e47628e/mlc-cached-state.json
[2025-10-09 13:32:25,466 module.py:574 INFO] - * mlcr get,ml-model,gptj,_nvidia,_fp8
[2025-10-09 13:32:25,493 module.py:574 INFO] - * mlcr get,git,repo,_lfs,_repo.https://github.com/NVIDIA/TensorRT-LLM.git,_sha.0ab9d17a59c284d2de36889832fe9fc7c8697604
[2025-10-09 13:32:25,495 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/mlc-cached-state.json
[2025-10-09 13:32:25,496 module.py:2251 INFO] - MLC cache path to the Git repo: /home/admin/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo
[2025-10-09 13:32:25,511 module.py:574 INFO] - * mlcr get,cuda
[2025-10-09 13:32:25,513 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-cuda_db5a33ab/mlc-cached-state.json
[2025-10-09 13:32:25,514 module.py:2251 INFO] - ENV[CUDA_HOME]: /home/admin/MLC/repos/local/cache/install-cuda-prebuilt_3bf07d9c/install
[2025-10-09 13:32:25,514 module.py:2251 INFO] - ENV[MLC_CUDA_PATH_LIB_CUDNN_EXISTS]: no
[2025-10-09 13:32:25,514 module.py:2251 INFO] - ENV[MLC_CUDA_VERSION]: 11.8
[2025-10-09 13:32:25,514 module.py:2251 INFO] - ENV[MLC_CUDA_VERSION_STRING]: cu118
[2025-10-09 13:32:25,515 module.py:2251 INFO] - ENV[MLC_NVCC_BIN_WITH_PATH]: /home/admin/MLC/repos/local/cache/install-cuda-prebuilt_3bf07d9c/install/bin/nvcc
[2025-10-09 13:32:25,526 module.py:574 INFO] - * mlcr get,cuda-devices
[2025-10-09 13:32:25,527 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-cuda-devices_56b0a920/mlc-cached-state.json
[2025-10-09 13:32:25,539 module.py:574 INFO] - * mlcr get,nvidia,scratch,space
[2025-10-09 13:32:25,541 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_1975fa48/mlc-cached-state.json
[2025-10-09 13:32:25,567 module.py:574 INFO] - * mlcr get,ml-model,gpt-j,_fp32,_pytorch
[2025-10-09 13:32:25,569 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-ml-model-gptj_a060ff70/mlc-cached-state.json
[2025-10-09 13:32:25,570 module.py:2251 INFO] - Path to the ML model: None
[2025-10-09 13:32:25,582 module.py:574 INFO] - * mlcr get,nvidia,inference,common-code,_mlcommons,_v4.0
[2025-10-09 13:32:25,584 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-nvidia-common-code_7324443e/mlc-cached-state.json
[2025-10-09 13:32:25,596 module.py:574 INFO] - * mlcr get,python3
[2025-10-09 13:32:25,597 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-python3_bc646a4c/mlc-cached-state.json
[2025-10-09 13:32:25,655 module.py:574 INFO] - * mlcr get,generic-python-lib,_package.safetensors
[2025-10-09 13:32:25,668 module.py:574 INFO] - * mlcr get,python3
[2025-10-09 13:32:25,670 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-python3_bc646a4c/mlc-cached-state.json
[2025-10-09 13:32:25,672 module.py:5266 INFO] - ! cd /home/admin/data/MLC/repos/local/cache/get-ml-model-gptj_bcc11285
[2025-10-09 13:32:25,672 module.py:5267 INFO] - ! call /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh
/home/admin/miniconda3/envs/mlperf/bin/python3 /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/detect-version.py
[2025-10-09 13:32:25,772 module.py:5412 INFO] - ! call "detect_version" from /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/customize.py
[2025-10-09 13:32:25,783 customize.py:152 INFO] - Detected version: 0.6.2
[2025-10-09 13:32:25,797 module.py:574 INFO] - * mlcr get,python3
[2025-10-09 13:32:25,799 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-python3_bc646a4c/mlc-cached-state.json
[2025-10-09 13:32:25,800 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-generic-python-lib_13f96e5e/mlc-cached-state.json
[2025-10-09 13:32:25,856 module.py:574 INFO] - * mlcr get,generic-python-lib,_torch
[2025-10-09 13:32:25,868 module.py:574 INFO] - * mlcr get,python3
[2025-10-09 13:32:25,869 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-python3_bc646a4c/mlc-cached-state.json
[2025-10-09 13:32:25,870 module.py:5266 INFO] - ! cd /home/admin/data/MLC/repos/local/cache/get-ml-model-gptj_bcc11285
[2025-10-09 13:32:25,870 module.py:5267 INFO] - ! call /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh
/home/admin/miniconda3/envs/mlperf/bin/python3 /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/detect-version.py
[2025-10-09 13:32:25,978 module.py:5412 INFO] - ! call "detect_version" from /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-generic-python-lib/customize.py
[2025-10-09 13:32:25,988 customize.py:152 INFO] - Detected version: 2.8.0
[2025-10-09 13:32:26,002 module.py:574 INFO] - * mlcr get,python3
[2025-10-09 13:32:26,004 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-python3_bc646a4c/mlc-cached-state.json
[2025-10-09 13:32:26,005 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-generic-python-lib_ef5bc811/mlc-cached-state.json
[2025-10-09 13:32:26,011 module.py:5266 INFO] - ! cd /home/admin/data/MLC/repos/local/cache/get-ml-model-gptj_bcc11285
[2025-10-09 13:32:26,011 module.py:5267 INFO] - ! call /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-ml-model-gptj/run-nvidia.sh from tmp-run.sh
cd /home/admin/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo
make: Entering directory '/home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo/docker'
Building docker image: tensorrt_llm/devel:latest
DOCKER_BUILDKIT=1 docker build --pull \
--progress auto \
--build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
--build-arg BASE_TAG=23.12-py3 \
--build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
--build-arg TORCH_INSTALL_TYPE="skip" \
\
\
\
\
\
--build-arg TRT_LLM_VER="0.9.0.dev2024020600" \
\
--build-arg GIT_COMMIT="0ab9d17a59c284d2de36889832fe9fc7c8697604" \
--target devel \
--file Dockerfile.multi \
--tag tensorrt_llm/devel:latest \
..
[+] Building 3.4s (21/21) FINISHED docker:default
=> [internal] load build definition from Dockerfile.multi 0.0s
=> => transferring dockerfile: 3.35kB 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6) 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14) 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 53) 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 66) 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:23.12-py3 3.3s
=> [auth] nvidia/pytorch:pull token for nvcr.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 316B 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 953B 0.0s
=> [base 1/1] FROM nvcr.io/nvidia/pytorch:23.12-py3@sha256:da3d1b690b9dca1fbf9beb3506120a63479e0cf1dc69c9256055125460eb44f7 0.0s
=> CACHED [devel 1/14] COPY docker/common/install_base.sh install_base.sh 0.0s
=> CACHED [devel 2/14] RUN bash ./install_base.sh && rm install_base.sh 0.0s
=> CACHED [devel 3/14] COPY docker/common/install_cmake.sh install_cmake.sh 0.0s
=> CACHED [devel 4/14] RUN bash ./install_cmake.sh && rm install_cmake.sh 0.0s
=> CACHED [devel 5/14] COPY docker/common/install_ccache.sh install_ccache.sh 0.0s
=> CACHED [devel 6/14] RUN bash ./install_ccache.sh && rm install_ccache.sh 0.0s
=> CACHED [devel 7/14] COPY docker/common/install_tensorrt.sh install_tensorrt.sh 0.0s
=> CACHED [devel 8/14] RUN bash ./install_tensorrt.sh --TRT_VER=${TRT_VER} --CUDA_VER=${CUDA_VER} --CUDNN_VER=${CUDNN_VER} --NCCL_VER=${NCCL_VER} --CUBLAS_VER=${CUBLAS_VER} & 0.0s
=> CACHED [devel 9/14] COPY docker/common/install_polygraphy.sh install_polygraphy.sh 0.0s
=> CACHED [devel 10/14] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh 0.0s
=> CACHED [devel 11/14] COPY docker/common/install_mpi4py.sh install_mpi4py.sh 0.0s
=> CACHED [devel 12/14] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh 0.0s
=> CACHED [devel 13/14] COPY docker/common/install_pytorch.sh install_pytorch.sh 0.0s
=> CACHED [devel 14/14] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:8b533fae10316fd054895b41e62b83a3c70bddcd4c4c84f711810a7aeb276e3b 0.0s
=> => naming to docker.io/tensorrt_llm/devel:latest 0.0s
4 warnings found (use docker --debug to expand):
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14)
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 53)
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 66)
make: Leaving directory '/home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo/docker'
make: Entering directory '/home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo/docker'
docker build --progress --pull --progress auto --build-arg BASE_IMAGE_WITH_TAG=tensorrt_llm/devel:latest --build-arg USER_ID=1000 --build-arg USER_NAME=admin --build-arg GROUP_ID=1000 --build-arg GROUP_NAME=admin --file Dockerfile.user --tag tensorrt_llm/devel:latest-admin ..
[+] Building 0.0s (6/6) FINISHED docker:default
=> [internal] load build definition from Dockerfile.user 0.0s
=> => transferring dockerfile: 488B 0.0s
=> WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 3) 0.0s
=> WARN: InvalidDefaultArgInFrom: Default value for ARG ${BASE_IMAGE_WITH_TAG} results in empty or invalid base image name (line 3) 0.0s
=> [internal] load metadata for docker.io/tensorrt_llm/devel:latest 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 316B 0.0s
=> [1/2] FROM docker.io/tensorrt_llm/devel:latest 0.0s
=> CACHED [2/2] RUN (getent group 1000 || groupadd --gid 1000 admin) && (getent passwd 1000 || useradd --gid 1000 --uid 1000 --create-home --no-log-init --shell /bin/bash admin) 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:abea1421d2b9b8220fcd7e06ff6035eb2616b84c517263c48ce17eae9d64e031 0.0s
=> => naming to docker.io/tensorrt_llm/devel:latest-admin 0.0s
2 warnings found (use docker --debug to expand):
- FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 3)
- InvalidDefaultArgInFrom: Default value for ARG ${BASE_IMAGE_WITH_TAG} results in empty or invalid base image name (line 3)
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v /home/admin/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_1975fa48:/mnt \
--gpus=all \
--volume /home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo:/code/tensorrt_llm \
--env "CCACHE_DIR=/code/tensorrt_llm/cpp/.ccache" \
--env "CCACHE_BASEDIR=/code/tensorrt_llm" \
--workdir /code/tensorrt_llm \
--hostname worker-g01-devel \
--name tensorrt_llm-devel-admin \
--tmpfs /tmp:exec \
tensorrt_llm/devel:latest-admin bash -c 'python3 scripts/build_wheel.py -a=90 --clean --install --trt_root /usr/local/tensorrt/ && python examples/quantization/quantize.py --dtype=float16 --output_dir=/mnt/models/GPTJ-6B/fp8-quantized-ammo/GPTJ-FP8-quantized --model_dir=/mnt/models/GPTJ-6B/checkpoint-final --qformat=fp8 --kv_cache_dtype=fp8 '
=============
== PyTorch ==
=============
NVIDIA Release 23.12 (build 76438008)
PyTorch Version 2.2.0a0+81ea7a4
Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.ngc.nvidia.com, https://download.pytorch.org/whl/cu121, https://pypi.nvidia.com
Collecting accelerate==0.25.0 (from -r requirements.txt (line 3))
Downloading accelerate-0.25.0-py3-none-any.whl.metadata (18 kB)
Collecting build (from -r requirements.txt (line 4))
Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting colored (from -r requirements.txt (line 5))
Downloading colored-2.3.1-py3-none-any.whl.metadata (3.6 kB)
Requirement already satisfied: cuda-python in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (12.3.0rc4+8.gcb4e395)
Collecting diffusers==0.15.0 (from -r requirements.txt (line 7))
Downloading diffusers-0.15.0-py3-none-any.whl.metadata (19 kB)
Collecting lark (from -r requirements.txt (line 8))
Downloading lark-1.3.0-py3-none-any.whl.metadata (1.8 kB)
Requirement already satisfied: mpi4py in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (3.1.5)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (1.24.4)
Requirement already satisfied: onnx>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.15.0rc2)
Requirement already satisfied: polygraphy in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (0.48.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (5.9.4)
Collecting pynvml>=11.5.0 (from -r requirements.txt (line 14))
Downloading pynvml-13.0.1-py3-none-any.whl.metadata (5.6 kB)
Collecting sentencepiece>=0.1.99 (from -r requirements.txt (line 15))
Downloading sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (10 kB)
Requirement already satisfied: tensorrt==9.2.0.post12.dev5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (9.2.0.post12.dev5)
Requirement already satisfied: torch<=2.2.0a in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 17)) (2.2.0a0+81ea7a4)
Collecting nvidia-ammo~=0.7.0 (from -r requirements.txt (line 18))
Downloading nvidia-ammo-0.7.4.tar.gz (6.9 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-hctz1hc8/nvidia-ammo_a9ae9c7f4d9440be9561a318b531fb76/setup.py", line 90, in <module>
raise RuntimeError("Bad params")
RuntimeError: Bad params
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[notice] A new release of pip is available: 23.3.1 -> 25.2
[notice] To update, run: python3 -m pip install --upgrade pip
Traceback (most recent call last):
File "/code/tensorrt_llm/scripts/build_wheel.py", line 319, in <module>
main(**vars(args))
File "/code/tensorrt_llm/scripts/build_wheel.py", line 67, in main
build_run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '"/usr/bin/python3" -m pip install -r requirements-dev.txt --extra-index-url https://pypi.ngc.nvidia.com' returned non-zero exit status 1.
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory '/home/admin/data/MLC/repos/local/cache/get-git-repo_tensorrt-llm_2ba98202/repo/docker'
Traceback (most recent call last):
File "/home/admin/miniconda3/envs/mlperf/bin/mlcr", line 8, in <module>
sys.exit(mlcr())
~~~~^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/main.py", line 87, in mlcr
main()
~~~~^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/main.py", line 274, in main
res = method(run_args)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 230, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
r = self._run(i)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1790, in _run
r = customize_code.preprocess(ii)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 285, in preprocess
r = mlc.access(ii)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 290, in docker
return self.call_script_module_function("docker", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 232, in call_script_module_function
result = automation_instance.docker(run_args) # Pass args to the run method
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 4588, in docker
return docker_run(self, i)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/docker.py", line 341, in docker_run
r = self_module._run_deps(
deps, [], env, {}, {}, {}, add_deps_recursive, '', [], '', False, '',
show_time, ' ', run_state)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3567, in _run_deps
r = self.action_object.access(ii)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 248, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = get-ml-model-gptj, return code = 256)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.
The reason I am using --env.MLC_NVIDIA_TP_SIZE= is because without it, the command will fail with the following error if I set it to any number:
[2025-10-09 13:36:38,503 module.py:574 INFO] - * mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev
[2025-10-09 13:36:38,526 module.py:574 INFO] - * mlcr get,mlcommons,inference,src
[2025-10-09 13:36:38,527 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-src_0775857b/mlc-cached-state.json
[2025-10-09 13:36:38,543 module.py:574 INFO] - * mlcr get,mlperf,inference,results,dir,_version.r5.0-dev
[2025-10-09 13:36:38,544 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-results-dir_65ceae20/mlc-cached-state.json
[2025-10-09 13:36:38,557 module.py:574 INFO] - * mlcr install,pip-package,for-mlc-python,_package.tabulate
[2025-10-09 13:36:38,559 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/install-pip-package-for-mlc-python_4f91e4e5/mlc-cached-state.json
[2025-10-09 13:36:38,572 module.py:574 INFO] - * mlcr get,mlperf,inference,utils
[2025-10-09 13:36:38,598 module.py:574 INFO] - * mlcr get,mlperf,inference,src
[2025-10-09 13:36:38,599 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-src_0775857b/mlc-cached-state.json
[2025-10-09 13:36:38,603 module.py:5412 INFO] - ! call "postprocess" from /home/admin/MLC/repos/mlcommons@mlperf-automations/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/admin/MLC/repos/local/cache/get-git-repo_inference-src_892d7667/inference
[2025-10-09 13:36:38,612 customize.py:273 INFO] -
Running loadgen scenario: Offline and mode: performance
[2025-10-09 13:36:38,768 module.py:574 INFO] - * mlcr get,mlperf,inference,submission,dir,local,_version.r5.0-dev
[2025-10-09 13:36:38,769 module.py:1294 INFO] - ! load /home/admin/data/MLC/repos/local/cache/get-mlperf-inference-submission-dir_0e47628e/mlc-cached-state.json
[2025-10-09 13:36:38,793 module.py:574 INFO] - * mlcr get,ml-model,gptj,_nvidia,_fp8,_tp-size.2
Traceback (most recent call last):
File "/home/admin/miniconda3/envs/mlperf/bin/mlcr", line 8, in <module>
sys.exit(mlcr())
~~~~^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/main.py", line 87, in mlcr
main()
~~~~^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/main.py", line 274, in main
res = method(run_args)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 230, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
r = self._run(i)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1790, in _run
r = customize_code.preprocess(ii)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 285, in preprocess
r = mlc.access(ii)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 290, in docker
return self.call_script_module_function("docker", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 232, in call_script_module_function
result = automation_instance.docker(run_args) # Pass args to the run method
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 4588, in docker
return docker_run(self, i)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/docker.py", line 341, in docker_run
r = self_module._run_deps(
deps, [], env, {}, {}, {}, add_deps_recursive, '', [], '', False, '',
show_time, ' ', run_state)
File "/home/admin/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3567, in _run_deps
r = self.action_object.access(ii)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/envs/mlperf/lib/python3.13/site-packages/mlc/script_action.py", line 248, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : no scripts were found with tags: get,ml-model,gptj,_nvidia,_fp8,_tp-size.2
variation tags ['nvidia', 'fp8', 'tp-size.2'] are not matching for the found script get-ml-model-gptj with variations dict_keys(['batch_size.#', 'fp32', 'fp8', 'int4', 'int8', 'intel', 'mlcommons', 'nvidia', 'pytorch', 'pytorch,fp32', 'pytorch,fp32,wget', 'pytorch,int4,intel', 'pytorch,int8,intel', 'pytorch,intel', 'pytorch,nvidia', 'rclone', 'saxml', 'saxml,fp32', 'saxml,int8', 'uint8', 'wget'])
Metadata
Metadata
Assignees
Labels
No labels