@@ -133,13 +133,13 @@ Ensure Docker is configured for sudoless use before running the build script. Fo
133133 - ** For TPUs:**
134134
135135 ```
136- bash dependencies/scripts/docker_build_dependency_image.sh DEVICE=tpu MODE=stable
136+ bash src/ dependencies/scripts/docker_build_dependency_image.sh DEVICE=tpu MODE=stable
137137 ```
138138
139139 - **For GPUs:**
140140
141141 ```
142- bash dependencies/scripts/docker_build_dependency_image.sh DEVICE=gpu MODE=stable
142+ bash src/ dependencies/scripts/docker_build_dependency_image.sh DEVICE=gpu MODE=stable
143143 ```
144144
145145---
@@ -165,8 +165,8 @@ This guide focuses on submitting workloads to an existing cluster. Cluster creat
1651652 . ** Configure gcloud CLI**
166166
167167 ```
168- gcloud config set project ${PROJECT_ID}
169- gcloud config set compute/zone ${ZONE}
168+ gcloud config set project ${PROJECT_ID? }
169+ gcloud config set compute/zone ${ZONE? }
170170 ```
171171
172172### A Note on multi-slice and multi-node runs
@@ -180,24 +180,24 @@ For instance, to run a job across **four TPU slices**, you would change `--num-s
180180
181181 ```
182182 xpk workload create\
183- --cluster ${CLUSTER_NAME}\
183+ --cluster ${CLUSTER_NAME? }\
184184 --workload ${USER}-tpu-job\
185185 --base-docker-image maxtext_base_image\
186186 --tpu-type v5litepod-256\
187187 --num-slices 1\
188- --command "python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${USER}-tpu-job base_output_directory=${BASE_OUTPUT_DIR} dataset_path=${DATASET_PATH} steps=100"
188+ --command "python3 -m maxtext.trainers.pre_train.train run_name=${USER}-tpu-job base_output_directory=${BASE_OUTPUT_DIR? } dataset_path=${DATASET_PATH? } steps=100"
189189 ```
190190
191191 - **On your GPU cluster:**
192192
193193 ```
194194 xpk workload create\
195- --cluster ${CLUSTER_NAME}\
195+ --cluster ${CLUSTER_NAME? }\
196196 --workload ${USER}-gpu-job\
197197 --base-docker-image maxtext_base_image\
198198 --device-type h100-80gb-8\
199199 --num-nodes 2\
200- --command "python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${USER}-gpu-job base_output_directory=${BASE_OUTPUT_DIR} dataset_path=${DATASET_PATH} steps=100"
200+ --command "python3 -m maxtext.trainers.pre_train.train run_name=${USER}-gpu-job base_output_directory=${BASE_OUTPUT_DIR? } dataset_path=${DATASET_PATH? } steps=100"
201201 ```
202202
203203---
@@ -233,7 +233,7 @@ The AOT artifact must be included in your Docker image. The `docker_upload_runne
233233``` bash
234234export CLOUD_IMAGE_NAME=" ${USER} -maxtext-aot-runner"
235235
236- bash dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME}
236+ bash src/ dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME}
237237```
238238
239239### Step 3: Create the XPK workload with the AOT artifact
@@ -276,13 +276,13 @@ Your job will now start faster by skipping the JAX compilation step on the clust
276276- ** List your jobs:**
277277
278278 ```
279- xpk workload list --cluster ${CLUSTER_NAME}
279+ xpk workload list --cluster ${CLUSTER_NAME? }
280280 ```
281281
282282- ** Analyze output:** Checkpoints and other artifacts will be saved to the Google Cloud Storage bucket you specified in ` BASE_OUTPUT_DIR ` .
283283
284284- ** Delete a job:**
285285
286286 ```
287- xpk workload delete --cluster ${CLUSTER_NAME} --workload <your-workload-name>
287+ xpk workload delete --cluster ${CLUSTER_NAME? } --workload <your-workload-name>
288288 ```
0 commit comments