[Feature Preview] Introduce Python bindings for NCCL with nccl4py#1902
[Feature Preview] Introduce Python bindings for NCCL with nccl4py#1902marksantesson wants to merge 1 commit intomasterfrom
Conversation
| "nccl.bindings" = ["*.pyi"] | ||
|
|
||
| [tool.setuptools.exclude-package-data] | ||
| "*" = ["__pycache__/*", "*.py[co]", "*.stamp", "*.pyx", "*.pxd", "*.cpp", "*.c"] |
There was a problem hiding this comment.
All *.pxd files are excluded. However, things like cynccl.pxd may still be useful for third party code, as it provides access to the core NCCL API. Maybe this file should be promoted to the top-level package directory and exposed to third-party Cython code? That's what I do in mpi4py with libmpi.pxd, though I have no idea how much people use it.
There was a problem hiding this comment.
Yes. By design, cynccl.pxd is meant to be public facing. However, I don't think this is currently tested by nccl4py? If not, then we should document that the Cython interface is experimental.
There was a problem hiding this comment.
no, only nccl.core is tested. The target of nccl4py for the first phase is to provide pythonic interfaces to use NCCL, this is the main reason I excluded .pxd files, and expect users to copy .pxd files from source tree if they do want to experiment on the cython interface. expose and mark them as experimental is a good idea.
|
For last 11 lines, can we consider to support contextmanager to ensure .destroy() to be called finally? with nccl.Communicator.init(nranks=nranks, rank=rank, unique_id=unique_id) as nccl_comm:
# other operations |
It’s in our backlog, but we’re being cautious about supporting it as ncclCommDestroy is an intra-node collective call that can easily lead to hangs in many cases. |
NCCL4Py provides Python language bindings for NCCL, providing a Pythonic interface to NCCL library's functionality. It enables Python applications to leverage NCCL's GPU-accelerated multi-GPU and multi-node communication capabilities for distributed computing workloads.
Key Features
Usage Model
NCCL4Py follows a simple workflow:
Limitations
For more details, see the respective sections in this documentation.
Quick Start
Here's a minimal example demonstrating NCCL4Py with an AllReduce operation: