Skip to content

MLperf script fails to build proper container to run test, offline and Server scenario #609

@skm123

Description

@skm123

Ran below script. Docker container built but failed to install dependency. There is no nvidia-ammo but it's mentioned in requirements.txt

https://docs.mlcommons.org/inference/benchmarks/language/llama2-70b/#__tabbed_1_2

Script
mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev
--model=llama2-70b-99
--implementation=nvidia
--framework=tensorrt
--category=datacenter
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=50
--tp_size=4
--nvidia_llama2_dataset_file_path=/home/nvidia/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl --rerun

Error
Requirement already satisfied: torch<=2.2.0a in /home/mlcuser/.local/lib/python3.8/site-packages (from -r requirements.txt (line 16)) (2.1.0a0+git32f93b1)
Collecting nvidia-ammo~=0.7.0
Downloading nvidia-ammo-0.7.4.tar.gz (6.9 kB)
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py'"'"'; file='"'"'/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-o3jj2u4v/nvidia-ammo/pip-egg-info
cwd: /tmp/pip-install-o3jj2u4v/nvidia-ammo/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py", line 90, in
raise RuntimeError("Bad params")
RuntimeError: Bad params
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Traceback (most recent call last):
File "scripts/build_wheel.py", line 332, in
main(**vars(args))
File "scripts/build_wheel.py", line 68, in main
build_run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '"/usr/bin/python3" -m pip install -r requirements-dev.txt --extra-index-url https://pypi.ngc.nvidia.com' returned non-zero exit status 1.
make[1]: *** [Makefile.build:307: build_trt_llm] Error 1
make[1]: Leaving directory '/home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA'
make: *** [/home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/Makefile.build:181: build] Error 2
Traceback (most recent call last):
File "/home/mlcuser/.local/bin/mlcr", line 8, in
sys.exit(mlcr())
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 87, in mlcr
main()
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 274, in main
res = method(run_args)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 230, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
r = self._run(i)
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1854, in _run
r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3362, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3554, in _run_deps
r = self.action_object.access(ii)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 230, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
r = self._run(i)
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1638, in _run
r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3362, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3554, in _run_deps
r = self.action_object.access(ii)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
result = method(options)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
return self.call_script_module_function("run", run_args)
File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 248, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = build-mlperf-inference-server-nvidia, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.

#######################################################
Action

Edited requirements.txt (which is created in the container) to exclude #nvidia-ammo but script overwrites and fails again when ran the offline script

mlcuser@0a8c7a0385a8:$ cd /home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM
mlcuser@0a8c7a0385a8:
/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM$ ls -al
total 108
drwxr-xr-x 14 mlcuser mlc 4096 Sep 10 18:03 .
drwxr-xr-x 5 mlcuser mlc 133 Sep 10 17:59 ..
-rw-r--r-- 1 mlcuser mlc 2356 Sep 10 18:02 .clang-format
-rw-r--r-- 1 mlcuser mlc 215 Sep 10 18:02 .dockerignore
drwxr-xr-x 10 mlcuser mlc 4096 Sep 10 18:03 .git
-rw-r--r-- 1 mlcuser mlc 40 Sep 10 18:02 .gitattributes
drwxr-xr-x 3 mlcuser mlc 36 Sep 10 18:02 .github
-rw-r--r-- 1 mlcuser mlc 313 Sep 10 18:02 .gitignore
-rw-r--r-- 1 mlcuser mlc 404 Sep 10 18:02 .gitmodules
-rw-r--r-- 1 mlcuser mlc 1494 Sep 10 18:02 .pre-commit-config.yaml
drwxr-xr-x 6 mlcuser mlc 80 Sep 10 18:02 3rdparty
-rw-r--r-- 1 mlcuser mlc 5646 Sep 10 18:02 CHANGELOG.md
-rw-r--r-- 1 mlcuser mlc 11358 Sep 10 18:00 LICENSE
-rw-r--r-- 1 mlcuser mlc 20362 Sep 10 18:02 README.md
drwxr-xr-x 4 mlcuser mlc 43 Sep 10 18:02 benchmarks
drwxr-xr-x 6 mlcuser mlc 113 Sep 10 18:02 cpp
drwxr-xr-x 3 mlcuser mlc 103 Sep 10 18:02 docker
drwxr-xr-x 3 mlcuser mlc 136 Sep 10 18:02 docs
drwxr-xr-x 33 mlcuser mlc 4096 Sep 10 18:02 examples
-rw-r--r-- 1 mlcuser mlc 261 Sep 10 18:02 requirements-dev-windows.txt
-rw-r--r-- 1 mlcuser mlc 211 Sep 10 18:02 requirements-dev.txt
-rw-r--r-- 1 mlcuser mlc 530 Sep 10 18:02 requirements-windows.txt
-rw-r--r-- 1 mlcuser mlc 447 Sep 10 18:03 requirements.txt
drwxr-xr-x 2 mlcuser mlc 59 Sep 10 18:02 scripts
-rw-r--r-- 1 mlcuser mlc 73 Sep 10 18:02 setup.cfg
-rw-r--r-- 1 mlcuser mlc 4067 Sep 10 18:02 setup.py
drwxr-xr-x 10 mlcuser mlc 4096 Sep 10 18:02 tensorrt_llm
drwxr-xr-x 12 mlcuser mlc 4096 Sep 10 18:02 tests
drwxr-xr-x 4 mlcuser mlc 125 Sep 10 18:02 windows
###############################################
mlcuser@0a8c7a0385a8:/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM$ cat requirements.txt
--extra-index-url https://download.pytorch.org/whl/cu121
--extra-index-url https://pypi.nvidia.com
accelerate==0.25.0
build
colored
cuda-python # Do not override the custom version of cuda-python installed in the NGC PyTorch image.
diffusers==0.15.0
lark
mpi4py
numpy
onnx>=1.12.0
polygraphy
psutil
pynvml>=11.5.0
sentencepiece>=0.1.99
torch<=2.2.0a
#nvidia-ammo
=0.7.0; platform_machine=="x86_64"
transformers==4.36.1
wheel
optimum
evaluate
janus

How to get the interactive docker ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions