NCCL Optimized primitives for collective multi-GPU communication. See also CUDA Favorite site Github - nccl project site