MLperf  script fails to build proper container to run test, offline and Server scenario

Ran below script. Docker container built but failed to install dependency.  There is no nvidia-ammo but it's mentioned in requirements.txt

https://docs.mlcommons.org/inference/benchmarks/language/llama2-70b/#__tabbed_1_2

Script
mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev \
   --model=llama2-70b-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50 \
   --tp_size=4 \
   --nvidia_llama2_dataset_file_path=/home/nvidia/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl --rerun



Error
Requirement already satisfied: torch<=2.2.0a in /home/mlcuser/.local/lib/python3.8/site-packages (from -r requirements.txt (line 16)) (2.1.0a0+git32f93b1)
Collecting nvidia-ammo~=0.7.0
  Downloading nvidia-ammo-0.7.4.tar.gz (6.9 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-o3jj2u4v/nvidia-ammo/pip-egg-info
         cwd: /tmp/pip-install-o3jj2u4v/nvidia-ammo/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-o3jj2u4v/nvidia-ammo/setup.py", line 90, in <module>
        raise RuntimeError("Bad params")
    RuntimeError: Bad params
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Traceback (most recent call last):
  File "scripts/build_wheel.py", line 332, in <module>
    main(**vars(args))
  File "scripts/build_wheel.py", line 68, in main
    build_run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '"/usr/bin/python3" -m pip install -r requirements-dev.txt --extra-index-url https://pypi.ngc.nvidia.com' returned non-zero exit status 1.
make[1]: *** [Makefile.build:307: build_trt_llm] Error 1
make[1]: Leaving directory '/home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA'
make: *** [/home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/Makefile.build:181: build] Error 2
Traceback (most recent call last):
  File "/home/mlcuser/.local/bin/mlcr", line 8, in <module>
    sys.exit(mlcr())
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 87, in mlcr
    main()
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 274, in main
    res = method(run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 230, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
    r = self._run(i)
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1854, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3362, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3554, in _run_deps
    r = self.action_object.access(ii)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
    result = method(options)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 230, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 240, in run
    r = self._run(i)
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1638, in _run
    r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3362, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3554, in _run_deps
    r = self.action_object.access(ii)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
    result = method(options)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 316, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 248, in call_script_module_function
    raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = build-mlperf-inference-server-nvidia, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.

#######################################################
Action

Edited requirements.txt (which is created in the container) to exclude #nvidia-ammo but script overwrites and fails again when ran the offline script

mlcuser@0a8c7a0385a8:~$ cd /home/mlcuser/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM
mlcuser@0a8c7a0385a8:~/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM$ ls -al
total 108
drwxr-xr-x 14 mlcuser mlc  4096 Sep 10 18:03 .
drwxr-xr-x  5 mlcuser mlc   133 Sep 10 17:59 ..
-rw-r--r--  1 mlcuser mlc  2356 Sep 10 18:02 .clang-format
-rw-r--r--  1 mlcuser mlc   215 Sep 10 18:02 .dockerignore
drwxr-xr-x 10 mlcuser mlc  4096 Sep 10 18:03 .git
-rw-r--r--  1 mlcuser mlc    40 Sep 10 18:02 .gitattributes
drwxr-xr-x  3 mlcuser mlc    36 Sep 10 18:02 .github
-rw-r--r--  1 mlcuser mlc   313 Sep 10 18:02 .gitignore
-rw-r--r--  1 mlcuser mlc   404 Sep 10 18:02 .gitmodules
-rw-r--r--  1 mlcuser mlc  1494 Sep 10 18:02 .pre-commit-config.yaml
drwxr-xr-x  6 mlcuser mlc    80 Sep 10 18:02 3rdparty
-rw-r--r--  1 mlcuser mlc  5646 Sep 10 18:02 CHANGELOG.md
-rw-r--r--  1 mlcuser mlc 11358 Sep 10 18:00 LICENSE
-rw-r--r--  1 mlcuser mlc 20362 Sep 10 18:02 README.md
drwxr-xr-x  4 mlcuser mlc    43 Sep 10 18:02 benchmarks
drwxr-xr-x  6 mlcuser mlc   113 Sep 10 18:02 cpp
drwxr-xr-x  3 mlcuser mlc   103 Sep 10 18:02 docker
drwxr-xr-x  3 mlcuser mlc   136 Sep 10 18:02 docs
drwxr-xr-x 33 mlcuser mlc  4096 Sep 10 18:02 examples
-rw-r--r--  1 mlcuser mlc   261 Sep 10 18:02 requirements-dev-windows.txt
-rw-r--r--  1 mlcuser mlc   211 Sep 10 18:02 requirements-dev.txt
-rw-r--r--  1 mlcuser mlc   530 Sep 10 18:02 requirements-windows.txt
-rw-r--r--  1 mlcuser mlc   447 Sep 10 18:03 requirements.txt
drwxr-xr-x  2 mlcuser mlc    59 Sep 10 18:02 scripts
-rw-r--r--  1 mlcuser mlc    73 Sep 10 18:02 setup.cfg
-rw-r--r--  1 mlcuser mlc  4067 Sep 10 18:02 setup.py
drwxr-xr-x 10 mlcuser mlc  4096 Sep 10 18:02 tensorrt_llm
drwxr-xr-x 12 mlcuser mlc  4096 Sep 10 18:02 tests
drwxr-xr-x  4 mlcuser mlc   125 Sep 10 18:02 windows
###############################################
mlcuser@0a8c7a0385a8:~/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6b3f63c5/repo/closed/NVIDIA/build/TRTLLM$ cat requirements.txt 
--extra-index-url https://download.pytorch.org/whl/cu121
--extra-index-url https://pypi.nvidia.com
accelerate==0.25.0
build
colored
cuda-python # Do not override the custom version of cuda-python installed in the NGC PyTorch image.
diffusers==0.15.0
lark
mpi4py
numpy
onnx>=1.12.0
polygraphy
psutil
pynvml>=11.5.0
sentencepiece>=0.1.99
torch<=2.2.0a
#nvidia-ammo~=0.7.0; platform_machine=="x86_64"
transformers==4.36.1
wheel
optimum
evaluate
janus



How to get the interactive docker ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLperf script fails to build proper container to run test, offline and Server scenario #609

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MLperf script fails to build proper container to run test, offline and Server scenario #609

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions