diff --git a/docs/profiling.md b/docs/profiling.md index 950244fdecf0..8d9a9e37190b 100644 --- a/docs/profiling.md +++ b/docs/profiling.md @@ -362,7 +362,13 @@ The following options are available for GPU profiling: [NVIDIA's CUPTI documentation](https://docs.nvidia.com/cupti/main/main.html#metrics-table). * `gpu_pm_sample_interval_us`: Sets the sampling interval in microseconds for CUPTI PM sampling. Defaults to `500`. -* `gpu_pm_sample_buffer_size_per_gpu_mb`: Sets the system memory buffer size per device in MB for CUPTI PM sampling. Defaults to 64MB. The maximum supported value is 4GB. +* `gpu_pm_sample_buffer_size_per_gpu_mb`: Sets the system memory buffer size + per device in MB for CUPTI PM sampling. Defaults to 64MB. The maximum + supported value is 4GB. +* `gpu_num_chips_to_profile_per_task`: Specifies the number of GPU devices to + profile per task. If not specified, set to 0, or set to an invalid value, + all available GPUs will be profiled. This can be used to decrease the trace + collection size. * `gpu_dump_graph_node_mapping`: If enabled, dumps CUDA graph node mapping information into the trace. Defaults to `False`.