diff --git a/docs/profiling.md b/docs/profiling.md
index 950244fdecf0..8d9a9e37190b 100644
--- a/docs/profiling.md
+++ b/docs/profiling.md
@@ -362,7 +362,13 @@ The following options are available for GPU profiling:
     [NVIDIA's CUPTI documentation](https://docs.nvidia.com/cupti/main/main.html#metrics-table).
 *   `gpu_pm_sample_interval_us`: Sets the sampling interval in microseconds
     for CUPTI PM sampling. Defaults to `500`.
-*   `gpu_pm_sample_buffer_size_per_gpu_mb`: Sets the system memory buffer size per device in MB for CUPTI PM sampling. Defaults to 64MB. The maximum supported value is 4GB.
+*   `gpu_pm_sample_buffer_size_per_gpu_mb`: Sets the system memory buffer size
+    per device in MB for CUPTI PM sampling. Defaults to 64MB. The maximum
+    supported value is 4GB.
+*   `gpu_num_chips_to_profile_per_task`: Specifies the number of GPU devices to
+    profile per task. If not specified, set to 0, or set to an invalid value,
+    all available GPUs will be profiled. This can be used to decrease the trace
+    collection size.
 *   `gpu_dump_graph_node_mapping`: If enabled, dumps CUDA graph node
     mapping information into the trace. Defaults to `False`.