jcuda.runtime.JCuda.cudaMemcpyAsync()方法的使用及代码示例

x33g5p2x  于2022-01-22 转载在 其他  
字(4.4k)|赞(0)|评价(0)|浏览(193)

本文整理了Java中jcuda.runtime.JCuda.cudaMemcpyAsync()方法的一些代码示例,展示了JCuda.cudaMemcpyAsync()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaMemcpyAsync()方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaMemcpyAsync

JCuda.cudaMemcpyAsync介绍

[英]Copies data between host and device.

cudaError_t cudaMemcpyAsync ( 
void* dst, 
const void* src, 
size_t count, 
cudaMemcpyKind kind, 
cudaStream_t stream = 0 )

Copies data between host and device. Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. The memory areas may not overlap. Calling cudaMemcpyAsync() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.

cudaMemcpyAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and the stream is non-zero, the copy may overlap with operations in other streams.
Note:

  • Note that this function may also return error codes from previous, asynchronous launches.
  • This function exhibits asynchronous behavior for most use cases.
    [中]在主机和设备之间复制数据
cudaError_t cudaMemcpyAsync ( 
void* dst, 
const void* src, 
size_t count, 
cudaMemcpyKind kind, 
cudaStream_t stream = 0 )

在主机和设备之间复制数据。将计数字节从src指向的内存区域复制到dst指向的内存区域,其中种类是cudaMemcpyHostToHost主机、cudaMemcpyHostToDevice、cudaMemcpyDeviceToHost或cudaMemcpyDeviceToDevice中的一种,并指定复制方向。内存区域不能重叠。使用与复制方向不匹配的dst和src指针调用cudaMemcpyAsync()会导致未定义的行为。
cudaMemcpyAsync()对于主机是异步的,因此调用可能会在复制完成之前返回。通过传递非零流参数,可以选择将副本关联到流。如果种类为cudaMemcpyHostToDevice或cudaMemcpyDeviceToHost且流为非零,则复制可能与其他流中的操作重叠。
注:
*请注意,此函数还可能返回以前异步启动的错误代码。
*对于大多数用例,此函数显示异步行为。

代码示例

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

/**
 * Cuda memcpy async.
 *
 * @param dst                 the dst
 * @param src                 the src
 * @param count               the count
 * @param cudaMemcpyKind_kind the cuda memcpy kind kind
 * @param stream              the stream
 */
public static void cudaMemcpyAsync(final CudaPointer dst, final CudaPointer src, final long count, final int cudaMemcpyKind_kind, cudaStream_t stream) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaMemcpyAsync(dst, src, count, cudaMemcpyKind_kind, stream);
 cudaMemcpyAsync_execution.accept((System.nanoTime() - startTime) / 1e9);
 log("cudaMemcpyAsync", result, new Object[]{dst, src, count, cudaMemcpyKind_kind, stream});
 handle(result);
}

代码示例来源:origin: com.simiacryptus/mindseye

/**
 * Cuda memcpy async.
 *
 * @param dst                 the dst
 * @param src                 the src
 * @param count               the count
 * @param cudaMemcpyKind_kind the cuda memcpy kind kind
 * @param stream              the stream
 */
public static void cudaMemcpyAsync(final CudaPointer dst, final CudaPointer src, final long count, final int cudaMemcpyKind_kind, cudaStream_t stream) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaMemcpyAsync(dst, src, count, cudaMemcpyKind_kind, stream);
 cudaMemcpyAsync_execution.accept((System.nanoTime() - startTime) / 1e9);
 log("cudaMemcpyAsync", result, new Object[]{dst, src, count, cudaMemcpyKind_kind, stream});
 handle(result);
}

代码示例来源:origin: org.nd4j/nd4j-jcublas-common

@Override
public Object copyToHost(DataBuffer copy,int offset) {
  JCudaBuffer buf2 = (JCudaBuffer) copy;
  Table<String, Integer, BaseCudaDataBuffer.DevicePointerInfo> pointersToContexts = buf2.getPointersToContexts();
  BaseCudaDataBuffer.DevicePointerInfo devicePointerInfo = pointersToContexts.get(Thread.currentThread().getName(),offset);
  JCuda.cudaMemcpyAsync(
      buf2.getHostPointer()
      , devicePointerInfo.getPointer()
      , devicePointerInfo.getLength()
      , cudaMemcpyKind.cudaMemcpyDeviceToHost
      , ContextHolder.getInstance().getCudaStream());
  return buf2.getHostPointer();
}

代码示例来源:origin: org.nd4j/nd4j-jcublas-common

@Override
public Object copyToHost(DataBuffer copy,int offset) {
  JCudaBuffer buf2 = (JCudaBuffer) copy;
  Table<String, Integer, BaseCudaDataBuffer.DevicePointerInfo> pointersToContexts = buf2.getPointersToContexts();
  BaseCudaDataBuffer.DevicePointerInfo devicePointerInfo = pointersToContexts.get(Thread.currentThread().getName(),offset);
  if(devicePointerInfo != null) {
    JCuda.cudaMemcpyAsync(
        buf2.getHostPointer()
        , devicePointerInfo.getPointer()
        , devicePointerInfo.getLength()
        , cudaMemcpyKind.cudaMemcpyDeviceToHost
        , ContextHolder.getInstance().getCudaStream());
  }
  return buf2.getHostPointer();
}

代码示例来源:origin: org.nd4j/nd4j-jcublas-common

ContextHolder.syncStream();
JCuda.cudaMemcpyAsync(
    get
    , devicePointer.getDevicePointer()
ContextHolder.syncStream();
JCuda.cudaMemcpyAsync(
    get
    , devicePointer.getDevicePointer()

相关文章

JCuda类方法