jcuda.runtime.JCuda.cudaSetDeviceFlags()方法的使用及代码示例

x33g5p2x  于2022-01-22 转载在 其他  
字(4.3k)|赞(0)|评价(0)|浏览(228)

本文整理了Java中jcuda.runtime.JCuda.cudaSetDeviceFlags()方法的一些代码示例,展示了JCuda.cudaSetDeviceFlags()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaSetDeviceFlags()方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaSetDeviceFlags

JCuda.cudaSetDeviceFlags介绍

[英]Sets flags to be used for device executions.

cudaError_t cudaSetDeviceFlags ( 
unsigned int  flags )

Sets flags to be used for device executions. Records flags as the flags to use when initializing the current device. If no device has been made current to the calling thread then flags will be applied to the initialization of any device initialized by the calling host thread, unless that device has had its initialization flags set explicitly by this or any host thread.

If the current device has been set and that device has already been initialized then this call will fail with the error cudaErrorSetOnActiveProcess. In this case it is necessary to reset device using cudaDeviceReset() before the device's initialization flags may be set.

The two LSBs of the flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.

  • cudaDeviceScheduleAuto: The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
  • cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
  • cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
  • cudaDeviceScheduleBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
  • cudaDeviceBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.

Deprecated: This flag was deprecated as of CUDA 4.0 and replaced with cudaDeviceScheduleBlockingSync.

  • cudaDeviceMapHost: This flag must be set in order to allocate pinned host memory that is accessible to the device. If this flag is not set, cudaHostGetDevicePointer() will always return a failure code.
  • cudaDeviceLmemResizeToMax: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
    [中]设置用于设备执行的标志
cudaError_t cudaSetDeviceFlags ( 
unsigned int  flags )

设置用于设备执行的标志。将标志记录为初始化当前设备时要使用的标志。如果没有设备成为调用线程的当前设备,则标志将应用于由调用主机线程初始化的任何设备的初始化,除非该设备已由该线程或任何主机线程显式设置其初始化标志。
如果当前设备已设置且该设备已初始化,则此调用将失败,并出现错误cudaErrorSetOnActiveProcess。在这种情况下,在设置设备的初始化标志之前,需要使用cudaDeviceReset()重置设备。
flags参数的两个lsb可用于控制CPU线程在等待设备结果时如何与OS调度器交互。
*cudaDeviceScheduleAuto:默认值,如果flags参数为零,则根据进程C中活动CUDA上下文的数量和系统P中逻辑处理器的数量使用启发式。如果C>P,则CUDA在等待设备时将屈服于其他OS线程,否则,CUDA在等待结果并在处理器上主动旋转时将不会产生。
*cudaDeviceScheduleSpin:指示CUDA在等待设备结果时主动旋转。这可以减少等待设备时的延迟,但如果CPU线程与CUDA线程并行执行工作,则可能会降低CPU线程的性能。
*cudaDeviceScheduleYield:指示CUDA在等待设备结果时放弃其线程。这会增加等待设备时的延迟,但会提高CPU线程与设备并行工作的性能。
*cudaDeviceScheduleBlockingSync:在等待设备完成工作时,指示CUDA阻止同步原语上的CPU线程。
*cudaDeviceBlockingSync:在等待设备完成工作时,指示CUDA阻止同步原语上的CPU线程。
已弃用:此标志从CUDA 4.0起已弃用,并替换为cudaDeviceScheduleBlockingSync。
*cudaDeviceMapHost:必须设置此标志才能分配设备可访问的固定主机内存。如果未设置此标志,cudaHostGetDevicePointer()将始终返回故障代码。
*cudadevicelmresizetomax:指示CUDA在调整内核的本地内存大小后不要减少本地内存。当启动许多本地内存使用率较高的内核时,这可以防止本地内存分配带来的震荡,但可能会增加内存使用率。

代码示例

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

/**
 * Cuda set device flags int.
 *
 * @param flags the flags
 * @return the int
 */
public static int cudaSetDeviceFlags(int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaSetDeviceFlags(flags);
 log("cudaSetDeviceFlags", result, new Object[]{flags});
 cudaDeviceSynchronize_execution.accept((System.nanoTime() - startTime) / 1e9);
 handle(result);
 return result;
}

代码示例来源:origin: com.simiacryptus/mindseye

/**
 * Cuda set device flags int.
 *
 * @param flags the flags
 * @return the int
 */
public static int cudaSetDeviceFlags(int flags) {
 long startTime = System.nanoTime();
 final int result = JCuda.cudaSetDeviceFlags(flags);
 log("cudaSetDeviceFlags", result, new Object[]{flags});
 cudaDeviceSynchronize_execution.accept((System.nanoTime() - startTime) / 1e9);
 handle(result);
 return result;
}

相关文章

JCuda类方法