Skip to content

Conversation

@hidara2000
Copy link

@hidara2000 hidara2000 commented Apr 15, 2025

  • fixed error with newer transforms package missing defs (should work with old and newer versions)
  • added supervision for varied annotations and tracking
  • added optimised onnx inference with relevant operations moved to GPU (cupy). IO binding to improve performance and GPUMemoryPool to better manage mem

IO Binding Benefits for Multiprocessing

Reduces contention for CPU-GPU data transfer pathways when multiple processes share GPU resources Enables more efficient process-per-GPU distribution by minimizing transfer overhead Improves scalability across multiple GPUs by optimizing each process-GPU communication Supports pipeline parallelism by keeping intermediate data on GPU between processing stages Allows for better load balancing across processes by reducing data movement bottlenecks Enables higher GPU utilization when distributing work across multiple processes Minimizes IPC (inter-process communication) overhead for inference workloads Helps maintain consistent performance when scaling to multiple workers

Couldn't test on a multiGPU setup

Results on the same 30s video

I have this table testing 1 video at a time

Script                  Time    Method     
torch_inf.py (orig)          13.20s cv2         
torch_inf_super.py      5.76s  supervision + batch
onnx_inf_super.py    5.36s  supervision + batch
onnx_inf_super_io.py    4.27s  supervision + batch + io_binding + GPU manager
onnx_inf_super_io.py    1.77s ea mp.Pool(8) supervision + batch + io_binding + GPU manager

NOTE: onnx_inf_super_io was created as the script was being used in a multiprocessing env. Originally this caused overwriting of GPU mem. The memory management mitigated this and sped up inference. Above the test was carried out for 24 30s videos processed in a pool of 8.

- fixed error with newer transforms package missing defs
- added supervision for varied annotations and tracking
- added optimised onnx inference with relevant operations moved to GPU (cupy). IO binding to improve performance and GPUMemoryPool to better manage mem

# IO Binding Benefits for Multiprocessing

Reduces contention for CPU-GPU data transfer pathways when multiple processes share GPU resources
Enables more efficient process-per-GPU distribution by minimizing transfer overhead
Improves scalability across multiple GPUs by optimizing each process-GPU communication
Supports pipeline parallelism by keeping intermediate data on GPU between processing stages
Allows for better load balancing across processes by reducing data movement bottlenecks
Enables higher GPU utilization when distributing work across multiple processes
Minimizes IPC (inter-process communication) overhead for inference workloads
Helps maintain consistent performance when scaling to multiple workers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant