本文整理了Java中jcuda.runtime.JCuda.cudaDeviceSetLimit()
方法的一些代码示例,展示了JCuda.cudaDeviceSetLimit()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaDeviceSetLimit()
方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaDeviceSetLimit
[英]Set resource limits.
cudaError_t cudaDeviceSetLimit (
cudaLimit limit,
size_t value )
Set resource limits. Setting limit to value is a request by the application to update the current limit maintained by the device. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cudaDeviceGetLimit() to find out exactly what the limit has been set to.
Setting each cudaLimit has its own specific restrictions, so each is discussed here.
cudaLimitStackSize controls the stack size in bytes of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitPrintfFifoSize controls the size in bytes of the shared FIFO used by the printf() and fprintf() device system calls. Setting cudaLimitPrintfFifoSize must be performed before launching any kernel that uses the printf() or fprintf() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitMallocHeapSize controls the size in bytes of the heap used by the malloc() and free() device system calls. Setting cudaLimitMallocHeapSize must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitDevRuntimeSyncDepth controls the maximum nesting depth of a grid at which a thread can safely call cudaDeviceSynchronize(). Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls cudaDeviceSynchronize() above the default sync depth, two levels of grids. Calls to cudaDeviceSynchronize() will fail with error code cudaErrorSyncDepthExceeded if the limitation is violated. This limit can be set smaller than the default or up the maximum launch depth of 24. When setting this limit, keep in mind that additional levels of sync depth require the runtime to reserve large amounts of device memory which can no longer be used for user allocations. If these reservations of device memory fail, cudaDeviceSetLimit will return cudaErrorMemoryAllocation, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitDevRuntimePendingLaunchCount controls the maximum number of outstanding device runtime launches that can be made from the current device. A grid is outstanding from the point of launch up until the grid is known to have been completed. Device runtime launches which violate this limitation fail and return cudaErrorLaunchPendingCountExceeded when cudaGetLastError() is called after launch. If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Keep in mind that being able to sustain additional pending launches will require the runtime to reserve larger amounts of device memory upfront which can no longer be used for allocations. If these reservations fail, cudaDeviceSetLimit will return cudaErrorMemoryAllocation, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error cudaErrorUnsupportedLimit being returned.
Note:
Note that this function may also return error codes from previous, asynchronous launches.
[中]设置资源限制
cudaError_t cudaDeviceSetLimit (
cudaLimit limit,
size_t value )
设置资源限制。将限制设置为值是应用程序请求更新设备保持的当前限制。驾驶员可以自由修改请求值以满足h/w要求(这可能是夹紧到最小或最大值,四舍五入到最近的元件尺寸等)。应用程序可以使用cudaDeviceGetLimit()准确地找出限制设置为什么。
设置每个cudaLimit都有其特定的限制,因此这里将讨论每个限制。
*cudaLimitStackSize控制每个GPU线程的堆栈大小(以字节为单位)。此限制仅适用于计算能力为2.0及更高的设备。试图在计算能力小于2.0的设备上设置此限制将导致返回错误CudaErrorRunSupportedLimit。
*cudaLimitPrintfFifoSize控制printf()和fprintf()设备系统调用使用的共享FIFO的大小(以字节为单位)。在启动任何使用printf()或fprintf()设备系统调用的内核之前,必须执行设置cudalimitprintfifosize,否则将返回cudaErrorInvalidValue。此限制仅适用于计算能力为2.0及更高的设备。试图在计算能力小于2.0的设备上设置此限制将导致返回错误CudaErrorRunSupportedLimit。
*cudaLimitMallocHeapSize控制malloc()和free()设备系统调用使用的堆的大小(以字节为单位)。在启动任何使用malloc()或free()设备系统调用的内核之前,必须执行设置cudaLimitMallocHeapSize,否则将返回cudaErrorInvalidValue。此限制仅适用于计算能力为2.0及更高的设备。试图在计算能力小于2.0的设备上设置此限制将导致返回错误CudaErrorRunSupportedLimit。
*cudaLimitDevRuntimeSyncDepth控制网格的最大嵌套深度,线程可以在该深度安全地调用cudaDeviceSynchronize()。设置此限制必须在启动使用设备运行时并在默认同步深度(两级网格)以上调用cudaDeviceSynchronize()的内核之前执行。如果违反限制,则调用cudaDeviceSynchronize()将失败,错误代码为cudaErrorSyncDepthExceeded。此限制可以设置为小于默认值或最大发射深度24。设置此限制时,请记住,额外级别的同步深度要求运行时保留大量设备内存,这些内存不能再用于用户分配。如果这些设备内存保留失败,cudaDeviceSetLimit将返回CUDAErrorMemoryLocation,并且可以将限制重置为较低的值。此限制仅适用于计算能力为3.5及更高的设备。试图在计算能力低于3.5的设备上设置此限制将导致返回错误CudaErrorRunSupportedLimit。
*cudaLimitDevRuntimePendingLaunchCount控制可从当前设备进行的未完成设备运行时启动的最大数量。从启动点到已知网格已完成为止,网格是未完成的。违反此限制的设备运行时启动失败,并在启动后调用cudaGetLastError()时返回CUDAErrorLaunchPendingCountExcepended。如果使用设备运行时的模块需要比默认值(2048次启动)更多的挂起启动,则可以增加此限制。请记住,要维持额外的挂起启动,运行时需要预先保留大量的设备内存,这些内存不能再用于分配。如果这些保留失败,cudaDeviceSetLimit将返回cudaErrorMemoryAllocation,并且可以将限制重置为较低的值。此限制仅适用于计算能力为3.5及更高的设备。试图在计算能力低于3.5的设备上设置此限制将导致返回错误CudaErrorRunSupportedLimit。
注:
请注意,此函数还可能返回以前异步启动的错误代码。
代码示例来源:origin: com.simiacryptus/mindseye
/**
* Cuda device set limit int.
*
* @param limit the limit
* @param value the value
*/
public static void cudaDeviceSetLimit(final int limit, long value) {
long startTime = System.nanoTime();
final int result = JCuda.cudaDeviceSetLimit(limit, value);
cudaDeviceSetLimit_execution.accept((System.nanoTime() - startTime) / 1e9);
log("cudaDeviceSetLimit(", result, new Object[]{limit, value});
handle(result);
}
代码示例来源:origin: com.simiacryptus/mindseye-cudnn
/**
* Cuda device set limit int.
*
* @param limit the limit
* @param value the value
*/
public static void cudaDeviceSetLimit(final int limit, long value) {
long startTime = System.nanoTime();
final int result = JCuda.cudaDeviceSetLimit(limit, value);
cudaDeviceSetLimit_execution.accept((System.nanoTime() - startTime) / 1e9);
log("cudaDeviceSetLimit(", result, new Object[]{limit, value});
handle(result);
}
内容来源于网络,如有侵权,请联系作者删除!