CUDA:IPC

Introduction - Interprocess Communication

CUDA C++ Programming Guide # Interprocess Communication

Any device memory pointer or event handle created by a host thread can be directly referenced by any other thread within the same process. It is not valid outside this process however, and therefore cannot be directly referenced by threads belonging to a different process.

To share device memory pointers and events across processes, an application must use the Inter Process Communication API, which is described in detail in the reference manual. The IPC API is only supported for 64-bit processes on Linux and for devices of compute capability 2.0 and higher. Note that the IPC API is not supported for cudaMallocManaged allocations.

Using this API, an application can get the IPC handle for a given device memory pointer using cudaIpcGetMemHandle(), pass it to another process using standard IPC mechanisms (for example, interprocess shared memory or files), and use cudaIpcOpenMemHandle() to retrieve a device pointer from the IPC handle that is a valid pointer within this other process. Event handles can be shared using similar entry points.

Note that allocations made by cudaMalloc() may be sub-allocated from a larger block of memory for performance reasons. In such case, CUDA IPC APIs will share the entire underlying memory block which may cause other sub-allocations to be shared, which can potentially lead to information disclosure between processes. To prevent this behavior, it is recommended to only share allocations with a 2MiB aligned size.

An example of using the IPC API is where a single primary process generates a batch of input data, making the data available to multiple secondary processes without requiring regeneration or copying.

Applications using CUDA IPC to communicate with each other should be compiled, linked, and run with the same CUDA driver and runtime.

NOTE
Since CUDA 11.5, only events-sharing IPC APIs are supported on L4T and embedded Linux Tegra devices with compute capability 7.x and higher. The memory-sharing IPC APIs are still not supported on Tegra platforms

한글 번역

호스트 스레드에 의해 생성된 모든 장치 메모리 포인터 또는 이벤트 핸들은 동일한 프로세스 내의 다른 스레드에서 직접 참조할 수 있습니다. 그러나 이 프로세스 외부에서는 유효하지 않으므로 다른 프로세스에 속한 스레드에서는 직접 참조할 수 없습니다.

프로세스 전체에서 장치 메모리 포인터와 이벤트를 공유하려면 애플리케이션은 참조 매뉴얼에 자세히 설명되어 있는 프로세스 간 통신 API를 사용해야 합니다. IPC API는 Linux의 64비트 프로세스와 컴퓨팅 기능 2.0 이상의 장치에서만 지원됩니다. cudaMallocManaged 할당에는 IPC API가 지원되지 않습니다.

이 API를 사용하면 애플리케이션은 cudaIpcGetMemHandle()을 사용하여 주어진 장치 메모리 포인터에 대한 IPC 핸들을 얻을 수 있고, 표준 IPC 메커니즘(예: 프로세스 간 공유 메모리 또는 파일)을 사용하여 이를 다른 프로세스에 전달할 수 있으며, cudaIpcOpenMemHandle()을 사용하여 이 다른 프로세스 내에서 유효한 포인터인 IPC 핸들의 장치 포인터입니다. 유사한 진입점을 사용하여 이벤트 핸들을 공유할 수 있습니다.

cudaMalloc()에 의해 이루어진 할당은 성능상의 이유로 더 큰 메모리 블록에서 하위 할당될 수 있습니다. 그러한 경우, CUDA IPC API는 전체 기본 메모리 블록을 공유하므로 다른 하위 할당이 공유될 수 있으며, 이는 잠재적으로 프로세스 간 정보 공개로 이어질 수 있습니다. 이 동작을 방지하려면 2MiB 정렬 크기의 할당만 공유하는 것이 좋습니다.

IPC API 사용의 예는 단일 기본 프로세스가 입력 데이터 배치를 생성하여 재생성 또는 복사 없이 여러 보조 프로세스에서 데이터를 사용할 수 있도록 하는 경우입니다.

CUDA IPC를 사용하여 서로 통신하는 애플리케이션은 동일한 CUDA 드라이버 및 런타임으로 컴파일, 링크 및 실행되어야 합니다.

메모
CUDA 11.5부터는 이벤트 공유 IPC API만 L4T 및 컴퓨팅 기능 7.x 이상의 임베디드 Linux Tegra 장치에서 지원됩니다. 메모리 공유 IPC API는 아직 Tegra 플랫폼에서 지원되지 않습니다.

APIs

CUDA Runtime API :: CUDA Toolkit Documentation

__host__ cudaError_t cudaIpcCloseMemHandle ( void* devPtr ): Attempts to close memory mapped with cudaIpcOpenMemHandle.

__host__ cudaError_t cudaIpcGetEventHandle ( cudaIpcEventHandle_t* handle, cudaEvent_t event ): Gets an interprocess handle for a previously allocated event.

__host__ cudaError_t cudaIpcGetMemHandle ( cudaIpcMemHandle_t* handle, void* devPtr ): Gets an interprocess memory handle for an existing device memory allocation.

__host__ cudaError_t cudaIpcOpenEventHandle ( cudaEvent_t* event, cudaIpcEventHandle_t handle ): Opens an interprocess event handle for use in the current process.

__host__ cudaError_t cudaIpcOpenMemHandle ( void** devPtr, cudaIpcMemHandle_t handle, unsigned int flags ): Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process.

Example

cuda-samples/Samples/0_Introduction/simpleIPC/simpleIPC.cu at master · NVIDIA/cuda-samples

Favorite site

CUDA C++ Programming Guide # Interprocess Communication