jcuda.runtime.JCuda.cudaMallocManaged()方法的使用及代码示例

x33g5p2x  于2022-01-22 转载在 其他  
字(5.5k)|赞(0)|评价(0)|浏览(154)

本文整理了Java中jcuda.runtime.JCuda.cudaMallocManaged()方法的一些代码示例,展示了JCuda.cudaMallocManaged()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaMallocManaged()方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaMallocManaged

JCuda.cudaMallocManaged介绍

[英]```
host cudaError_t cudaMallocManaged (
void** devPtr,
size_t size,
unsigned int flags = cudaMemAttachGlobal )


Allocates memory that will be automatically managed by the Unified Memory system. 
##### Description

Allocates size bytes of managed memory on the device and returns in *devPtr a pointer to the allocated memory. If the device doesn't support allocating managed memory, cudaErrorNotSupported is returned. Support for managed memory can be queried using the device attribute cudaDevAttrManagedMemory. The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. If size is 0, cudaMallocManaged returns cudaErrorInvalidValue. The pointer is valid on the CPU and on all GPUs in the system that support managed memory. All accesses to this pointer must obey the Unified Memory programming model.

flags specifies the default stream association for this allocation. flags must be one of cudaMemAttachGlobal or cudaMemAttachHost. The default value for flags is cudaMemAttachGlobal. If cudaMemAttachGlobal is specified, then this memory is accessible from any stream on any device. If cudaMemAttachHost is specified, then the allocation is created with initial visibility restricted to host access only; an explicit call to cudaStreamAttachMemAsync will be required to enable access on the device.

If the association is later changed via cudaStreamAttachMemAsync to a single stream, the default association, as specifed during cudaMallocManaged, is restored when that stream is destroyed. For __managed__ variables, the default association is always cudaMemAttachGlobal. Note that destroying a stream is an asynchronous operation, and as a result, the change to default association won't happen until all work in the stream has completed.

Memory allocated with cudaMallocManaged should be released with cudaFree.

On a multi-GPU system with peer-to-peer support, where multiple GPUs support managed memory, the physical storage is created on the GPU which is active at the time cudaMallocManaged is called. All other GPUs will reference the data at reduced bandwidth via peer mappings over the PCIe bus. The Unified Memory management system does not migrate memory between GPUs.

On a multi-GPU system where multiple GPUs support managed memory, but not all pairs of such GPUs have peer-to-peer support between them, the physical storage is created in 'zero-copy' or system memory. All GPUs will reference the data at reduced bandwidth over the PCIe bus. In these circumstances, use of the environment variable, CUDA_VISIBLE_DEVICES, is recommended to restrict CUDA to only use those GPUs that have peer-to-peer support. Alternatively, users can also set CUDA_MANAGED_FORCE_DEVICE_ALLOC to a non-zero value to force the driver to always use device memory for physical storage. When this environment variable is set to a non-zero value, all devices used in that process that support managed memory have to be peer-to-peer compatible with each other. The error cudaErrorInvalidDevice will be returned if a device that supports managed memory is used and it is not peer-to-peer compatible with any of the other managed memory supporting devices that were previously used in that process, even if cudaDeviceReset has been called on those devices. These environment variables are described in the CUDA programming guide under the "CUDA environment variables" section.
[中]```
__host__ cudaError_t cudaMallocManaged ( 
void** devPtr, 
size_t size, 
unsigned int  flags = cudaMemAttachGlobal )

分配将由统一内存系统自动管理的内存。
#####描述
在设备上分配受管内存的大小字节,并在*devPtr中返回指向已分配内存的指针。如果设备不支持分配托管内存,则返回cudaErrorNotSupported。可以使用设备属性CUDADEVATTRMANGEDMEMORY查询对托管内存的支持。为任何类型的变量适当地对齐分配的内存。内存未被清除。如果大小为0,则cudaMallocManaged返回CUDAERROINVALIDVALUE。该指针在CPU和系统中支持托管内存的所有GPU上有效。对该指针的所有访问必须遵守统一内存编程模型。
标志指定此分配的默认流关联。标志必须是cudameAttachGlobal或cudameAttachHost之一。标志的默认值为cudameattachglobal。如果指定了cudameAttachGlobal,则可以从任何设备上的任何流访问此内存。如果指定了cudameAttachHost,则创建分配时初始可见性仅限于主机访问;需要显式调用CUDAStreamAttacheMasync才能启用对设备的访问。
如果以后通过CUDAStreamAttacheMasync将关联更改为单个流,则当该流被销毁时,将恢复cudaMallocManaged期间指定的默认关联。对于托管变量,默认关联始终为cudameattachglobal。请注意,销毁流是一个异步操作,因此,在流中的所有工作完成之前,不会发生对默认关联的更改。
使用cudaMallocManaged分配的内存应使用cudaFree释放。
在具有对等支持的多GPU系统上,如果多个GPU支持托管内存,则在调用cudaMallocManaged时处于活动状态的GPU上创建物理存储。所有其他GPU将通过PCIe总线上的对等映射以减少的带宽引用数据。统一内存管理系统不会在GPU之间迁移内存。
在多GPU系统上,如果多个GPU支持托管内存,但并非所有GPU对之间都具有对等支持,则在“零拷贝”或系统内存中创建物理存储。所有GPU都将通过PCIe总线以降低的带宽引用数据。在这些情况下,建议使用环境变量CUDA_VISIBLE_DEVICES来限制CUDA仅使用具有对等支持的GPU。或者,用户也可以将CUDA_MANAGED_FORCE_DEVICE_ALLOC设置为非零值,以强制驱动程序始终使用设备内存进行物理存储。当此环境变量设置为非零值时,该进程中使用的所有支持托管内存的设备必须彼此对等兼容。如果使用了支持托管内存的设备,并且该设备与以前在该过程中使用的任何其他支持托管内存的设备都不对等兼容,则将返回错误cudaErrorInvalidDevice,即使已在这些设备上调用了cudaDeviceReset。这些环境变量在《CUDA编程指南》的“CUDA环境变量”一节中进行了描述。

代码示例

代码示例来源:origin: com.simiacryptus/mindseye

/**
 * Cuda malloc managed int.
 *
 * @param devPtr the dev ptr
 * @param size   the size
 * @param flags  the flags
 * @return the int
 */
public static int cudaMallocManaged(final CudaPointer devPtr, final long size, int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaMallocManaged(devPtr, size, flags);
 log("cudaMallocManaged", result, new Object[]{devPtr, size, flags});
 cudaMallocManaged_execution.accept((System.nanoTime() - startTime) / 1e9);
 handle(result);
 return result;
}

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

/**
 * Cuda malloc managed int.
 *
 * @param devPtr the dev ptr
 * @param size   the size
 * @param flags  the flags
 * @return the int
 */
public static int cudaMallocManaged(final CudaPointer devPtr, final long size, int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaMallocManaged(devPtr, size, flags);
 log("cudaMallocManaged", result, new Object[]{devPtr, size, flags});
 cudaMallocManaged_execution.accept((System.nanoTime() - startTime) / 1e9);
 handle(result);
 return result;
}

相关文章

JCuda类方法