jcuda.runtime.JCuda.cudaHostAlloc()方法的使用及代码示例

x33g5p2x  于2022-01-22 转载在 其他  
字(4.7k)|赞(0)|评价(0)|浏览(236)

本文整理了Java中jcuda.runtime.JCuda.cudaHostAlloc()方法的一些代码示例,展示了JCuda.cudaHostAlloc()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaHostAlloc()方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaHostAlloc

JCuda.cudaHostAlloc介绍

[英]Allocates page-locked memory on the host.

cudaError_t cudaHostAlloc ( 
void** pHost, 
size_t size, 
unsigned int  flags )

Allocates page-locked memory on the host. Allocates size bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

The flags parameter enables different options to be specified that affect the allocation, as follows.

  • cudaHostAllocDefault: This flag's value is defined to be 0 and causes cudaHostAlloc() to emulate cudaMallocHost().
  • cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
  • cudaHostAllocMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().
  • cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers.

All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.

cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.

The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.

Memory allocated by this function must be freed with cudaFreeHost().
Note:

Note that this function may also return error codes from previous, asynchronous launches.
[中]在主机上分配页锁定内存

cudaError_t cudaHostAlloc ( 
void** pHost, 
size_t size, 
unsigned int  flags )

在主机上分配页锁定内存。分配被页面锁定并可供设备访问的主机内存大小字节。驱动程序跟踪使用此函数分配的虚拟内存范围,并自动加速对cudaMemcpy()等函数的调用。由于设备可以直接访问内存,因此可以使用比使用malloc()等函数获得的可分页内存更高的带宽来读取或写入内存。分配过多的固定内存可能会降低系统性能,因为这会减少系统可用于分页的内存量。因此,最好少用此功能为主机和设备之间的数据交换分配临时区域。
flags参数允许指定影响分配的不同选项,如下所示。
*cudaHostAllocDefault:此标志的值定义为0,并导致cudaHostAlloc()模拟cudaMallocHost()。
*cudaHostAllocPortable:此调用返回的内存将被所有CUDA上下文视为固定内存,而不仅仅是执行分配的上下文。
*cudaHostAllocMapped:将分配映射到CUDA地址空间。可通过调用cudaHostGetDevicePointer()获取指向内存的设备指针。
*cudaHostAllocWriteCombined:将内存分配为写入组合(WC)。在某些系统配置上,WC内存可以通过PCI Express总线更快地传输,但大多数CPU无法高效读取。WC内存是一个很好的缓冲区选择,CPU将写入缓冲区,设备通过映射固定内存或主机->设备传输读取缓冲区。
所有这些标志相互正交:开发人员可以分配可移植、映射和/或写入的内存,而不受任何限制。
必须使用cudaDeviceMapHost标志调用cudaSetDeviceFlags(),才能使cudaHostAllocMapped标志生效。
对于不支持映射固定内存的设备,可以在CUDA上下文中指定cudaHostAllocMapped标志。由于内存可能通过cudaHostAllocPortable标志映射到其他CUDA上下文,因此故障将推迟到CUDAHOSTGETDEVIECEPOINTER()。
必须使用cudaFreeHost()释放此函数分配的内存。
注:
请注意,此函数还可能返回以前异步启动的错误代码。

代码示例

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

/**
 * Cuda host alloc int.
 *
 * @param devPtr the dev ptr
 * @param size   the size
 * @param flags  the flags
 * @return the int
 */
public static int cudaHostAlloc(final CudaPointer devPtr, final long size, int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaHostAlloc(devPtr, size, flags);
 cudaHostAlloc_execution.accept((System.nanoTime() - startTime) / 1e9);
 log("cudaHostAlloc", result, new Object[]{devPtr, size, flags});
 handle(result);
 return result;
}

代码示例来源:origin: com.simiacryptus/mindseye

/**
 * Cuda host alloc int.
 *
 * @param devPtr the dev ptr
 * @param size   the size
 * @param flags  the flags
 * @return the int
 */
public static int cudaHostAlloc(final CudaPointer devPtr, final long size, int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaHostAlloc(devPtr, size, flags);
 cudaHostAlloc_execution.accept((System.nanoTime() - startTime) / 1e9);
 log("cudaHostAlloc", result, new Object[]{devPtr, size, flags});
 handle(result);
 return result;
}

代码示例来源:origin: org.nd4j/nd4j-jcublas-common

@Override
public Object alloc(DataBuffer buffer,int stride,int offset,int length) {
  Pointer hostPointer = new Pointer();
  BaseCudaDataBuffer.DevicePointerInfo devicePointerInfo = new BaseCudaDataBuffer.DevicePointerInfo(
      hostPointer
      , length
      ,stride
      ,offset);
  JCuda.cudaHostAlloc(
      hostPointer
      , buffer.getElementSize() * length
      , JCuda.cudaHostAllocDefault);
  return devicePointerInfo;
}

相关文章

JCuda类方法