Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
46a17fa
Initial version of runner w/out Ray support.
rlratzel Oct 16, 2025
1a9f55a
Initial successful e2e run
rlratzel Oct 16, 2025
849428a
Allows user to see if benchmark script subprocess is running.
rlratzel Oct 17, 2025
c1e5d52
Makes timeout required, removed redundant code, updates docstrings.
rlratzel Oct 17, 2025
50f4582
Adds initial sinks including working version of SlackSink.
rlratzel Oct 22, 2025
27b4b8c
Fixes problem where env data was not returned from dump_env().
rlratzel Oct 22, 2025
4f7a407
Initial version of docker build helper script.
rlratzel Oct 22, 2025
c688b05
Adds files for testing, updates to support testing.
rlratzel Oct 22, 2025
f928de5
Updates from testing: make fewer assumptions about output file locati…
rlratzel Oct 22, 2025
7d8e82c
Adds copyright/license headers, adds random output to test benchmark …
rlratzel Oct 23, 2025
7d5d78e
Adds copyright/license headers.
rlratzel Oct 23, 2025
13eaca6
Adds (AI generated) README.md, updates run.sh script to use updated i…
rlratzel Oct 23, 2025
aa9b2f6
Fixes linter errors.
rlratzel Oct 24, 2025
c16bd72
Removes unneeded mamba install, adds code to check required config va…
rlratzel Oct 24, 2025
40cd660
Updates to tolerate a missing datasets config.
rlratzel Oct 24, 2025
0589bff
Changes Dockerfile to FROM production Curator image and add benchmark…
rlratzel Oct 26, 2025
93db662
Updates run.sh script to properly pass args to bash or start shell if…
rlratzel Oct 26, 2025
0b8f3db
Makes default_config_file non-private to the module.
rlratzel Oct 26, 2025
aa2eb28
Merges upstream/main
rlratzel Oct 26, 2025
f42e001
Reverts all docker/Dockerfile changes.
rlratzel Oct 26, 2025
d5a53a1
Defaults to using Curator built into image but adds convenience optio…
rlratzel Oct 26, 2025
9aba59e
Fixes typo in comment.
rlratzel Oct 26, 2025
94b93cc
Adds help text to run.sh and minor cleanup.
rlratzel Oct 26, 2025
4883074
Updates help text in run.sh for clarity.
rlratzel Oct 26, 2025
650a5c9
Removes SLACK_WEBHOOK_URL from help text since env vars used by optio…
rlratzel Oct 26, 2025
606d866
Fixes typo in help text.
rlratzel Oct 26, 2025
1282436
Adds Ray setup and teardown, moves __post_init__ to top for discovera…
rlratzel Oct 27, 2025
731ce8e
Initial version of gdrive sink, adds dummy files for testing, refacto…
rlratzel Oct 27, 2025
e77fb0b
Adds code to get various version strings and GPU info from the enviro…
rlratzel Oct 28, 2025
3929923
Fixes issue with getting git commit from repos in mounted directories…
rlratzel Oct 28, 2025
fa08054
Updates to allow user to specify path mappings between host and conta…
rlratzel Oct 30, 2025
cd5d69a
Simplifies config by requiring only host-based paths, adds script to …
rlratzel Oct 30, 2025
0da5961
Fixes problem generating options for --shell.
rlratzel Oct 30, 2025
d776d64
Fixes problem determining if in container, comments out mlflow
rlratzel Oct 30, 2025
5ffe116
Adds comment describing paths in nightly-benchmark.yaml
rlratzel Oct 30, 2025
dab4b4b
Updates embedding_generation_benchmark based on example code in Curat…
rlratzel Oct 31, 2025
2d462bb
Updates host to container volume mounts to use a common prefix for ea…
rlratzel Oct 31, 2025
e79062d
Updates to make runscript eval string function more reusable and refa…
rlratzel Oct 31, 2025
b06189d
Cleans up comments and var names.
rlratzel Oct 31, 2025
af14bc9
Updated comment to match current host:container mounts.
rlratzel Oct 31, 2025
15a05bd
Adds ability to specify additional data per entry, adds ability to Sl…
rlratzel Nov 4, 2025
974d8d6
Fixes issue with using GPUS env var in docker run command, uses host …
rlratzel Nov 4, 2025
3ba0f2a
Merge remote-tracking branch 'upstream/main' into nightly_dockerfile
rlratzel Nov 4, 2025
f143d47
Adds initial working version of requirements checking feature and sup…
rlratzel Nov 5, 2025
7302dcf
Updates README.md to match current state of code and respond to PR fe…
rlratzel Nov 6, 2025
d26b5ca
Updates help text to describe --config
rlratzel Nov 6, 2025
b5f16f3
Fixes return type annotation.
rlratzel Nov 6, 2025
9c9b50e
Fixes return type annotation for main().
rlratzel Nov 6, 2025
6d703ab
Fixes type annotation issues and inconsistent Path/str usage.
rlratzel Nov 6, 2025
549d997
Fixes requirements_not_met setting so None reasons (meaning they were…
rlratzel Nov 6, 2025
2651d89
Makes slack report row for each entry name contain bolded name then s…
rlratzel Nov 6, 2025
d1a8250
Fixes path substitution to not attempt to substitute paths that are n…
rlratzel Nov 6, 2025
9d7d34e
Updates __post_init__ for MatrixEntry to handle lists and dicts prope…
rlratzel Nov 6, 2025
f840ca6
Updates README based on review feedback, adds summary section to top …
rlratzel Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions benchmarking/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

ARG NEMO_CURATOR_IMAGE=nemo_curator
FROM ${NEMO_CURATOR_IMAGE} AS nemo_curator_benchmarking

# Add system utilities useful for benchmark and debug
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
less \
openssh-client \
vim \
wget \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Add dependencies for benchmarking to the Curator Python environment
RUN cd /opt/Curator \
&& uv add \
GitPython \
oauth2client \
pydrive2 \
pynvml \
rich \
&& uv cache prune

# Add the Curator repo to the safe.directory list to avoid GitPython warnings
RUN git config --global --add safe.directory /opt/Curator

# Set the entrypoint to the main benchmarking runner script
ENTRYPOINT ["python", "/opt/Curator/benchmarking/run.py"]
Loading