jcuda.runtime.JCuda.cudaGetDeviceProperties()方法的使用及代码示例

x33g5p2x  于2022-01-22 转载在 其他  
字(14.4k)|赞(0)|评价(0)|浏览(138)

本文整理了Java中jcuda.runtime.JCuda.cudaGetDeviceProperties()方法的一些代码示例,展示了JCuda.cudaGetDeviceProperties()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。JCuda.cudaGetDeviceProperties()方法的具体详情如下:
包路径:jcuda.runtime.JCuda
类名称:JCuda
方法名:cudaGetDeviceProperties

JCuda.cudaGetDeviceProperties介绍

[英]Returns information about the compute-device.

cudaError_t cudaGetDeviceProperties ( 
cudaDeviceProp* prop, 
int  device )

Returns information about the compute-device. Returns in *prop the properties of device dev. The cudaDeviceProp structure is defined as:

struct cudaDeviceProp { 
char name[256]; 
size_t totalGlobalMem; 
size_t sharedMemPerBlock; 
int regsPerBlock; 
int warpSize; 
size_t memPitch; 
int maxThreadsPerBlock; 
int maxThreadsDim[3]; 
int maxGridSize[3]; 
int clockRate; 
size_t totalConstMem; 
int major; 
int minor; 
size_t textureAlignment; 
size_t texturePitchAlignment; 
int deviceOverlap; 
int multiProcessorCount; 
int kernelExecTimeoutEnabled; 
int integrated; 
int canMapHostMemory; 
int computeMode; 
int maxTexture1D; 
int maxTexture1DMipmap; 
int maxTexture1DLinear; 
int maxTexture2D[2]; 
int maxTexture2DMipmap[2]; 
int maxTexture2DLinear[3]; 
int maxTexture2DGather[2]; 
int maxTexture3D[3]; 
int maxTextureCubemap; 
int maxTexture1DLayered[2]; 
int maxTexture2DLayered[3]; 
int maxTextureCubemapLayered[2]; 
int maxSurface1D; 
int maxSurface2D[2]; 
int maxSurface3D[3]; 
int maxSurface1DLayered[2]; 
int maxSurface2DLayered[3]; 
int maxSurfaceCubemap; 
int maxSurfaceCubemapLayered[2]; 
size_t surfaceAlignment; 
int concurrentKernels; 
int ECCEnabled; 
int pciBusID; 
int pciDeviceID; 
int pciDomainID; 
int tccDriver; 
int asyncEngineCount; 
int unifiedAddressing; 
int memoryClockRate; 
int memoryBusWidth; 
int l2CacheSize; 
int maxThreadsPerMultiProcessor; 
}

where:

  • name[256] is an ASCII string identifying the device;

  • totalGlobalMem is the total amount of global memory available on the device in bytes;

  • sharedMemPerBlock is the maximum amount of shared memory available to a thread block in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;

  • regsPerBlock is the maximum number of 32-bit registers available to a thread block; this number is shared by all thread blocks simultaneously resident on a multiprocessor;

  • warpSize is the warp size in threads;

  • memPitch is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch();

  • maxThreadsPerBlock is the maximum number of threads per block;

  • maxThreadsDim[3] contains the maximum size of each dimension of a block;

  • maxGridSize[3] contains the maximum size of each dimension of a grid;

  • clockRate is the clock frequency in kilohertz;

  • totalConstMem is the total amount of constant memory available on the device in bytes;

  • major, minor are the major and minor revision numbers defining the device's compute capability;

  • textureAlignment is the alignment requirement; texture base addresses that are aligned to textureAlignment bytes do not need an offset applied to texture fetches;

  • texturePitchAlignment is the pitch alignment requirement for 2D texture references that are bound to pitched memory;

  • deviceOverlap is 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not. Deprecated, use instead asyncEngineCount.

  • multiProcessorCount is the number of multiprocessors on the device;

  • kernelExecTimeoutEnabled is 1 if there is a run time limit for kernels executed on the device, or 0 if not.

  • integrated is 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) component.

  • canMapHostMemory is 1 if the device can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer(), or 0 if not;

  • computeMode is the compute mode that the device is currently in. Available modes are as follows:

  • cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use cudaSetDevice() with this device.

    • cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use cudaSetDevice() with this device.
    • cudaComputeModeProhibited: Compute-prohibited mode - No threads can use cudaSetDevice() with this device.
    • cudaComputeModeExclusiveProcess: Compute-exclusive-process mode - Many threads in one process will be able to use cudaSetDevice() with this device.

If cudaSetDevice() is called on an already occupied device with computeMode cudaComputeModeExclusive, cudaErrorDeviceAlreadyInUse will be immediately returned indicating the device cannot be used. When an occupied exclusive mode device is chosen with cudaSetDevice, all subsequent non-device management runtime functions will return cudaErrorDevicesUnavailable.

  • maxTexture1D is the maximum 1D texture size.
  • maxTexture1DMipmap is the maximum 1D mipmapped texture texture size.
  • maxTexture1DLinear is the maximum 1D texture size for textures bound to linear memory.
  • maxTexture2D[2] contains the maximum 2D texture dimensions.
  • maxTexture2DMipmap[2] contains the maximum 2D mipmapped texture dimensions.
  • maxTexture2DLinear[3] contains the maximum 2D texture dimensions for 2D textures bound to pitch linear memory.
  • maxTexture2DGather[2] contains the maximum 2D texture dimensions if texture gather operations have to be performed.
  • maxTexture3D[3] contains the maximum 3D texture dimensions.
  • maxTextureCubemap is the maximum cubemap texture width or height.
  • maxTexture1DLayered[2] contains the maximum 1D layered texture dimensions.
  • maxTexture2DLayered[3] contains the maximum 2D layered texture dimensions.
  • maxTextureCubemapLayered[2] contains the maximum cubemap layered texture dimensions.
  • maxSurface1D is the maximum 1D surface size.
  • maxSurface2D[2] contains the maximum 2D surface dimensions.
  • maxSurface3D[3] contains the maximum 3D surface dimensions.
  • maxSurface1DLayered[2] contains the maximum 1D layered surface dimensions.
  • maxSurface2DLayered[3] contains the maximum 2D layered surface dimensions.
  • maxSurfaceCubemap is the maximum cubemap surface width or height.
  • maxSurfaceCubemapLayered[2] contains the maximum cubemap layered surface dimensions.
  • surfaceAlignment specifies the alignment requirements for surfaces.
  • concurrentKernels is 1 if the device supports executing multiple kernels within the same context simultaneously, or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness;
  • ECCEnabled is 1 if the device has ECC support turned on, or 0 if not.
  • pciBusID is the PCI bus identifier of the device.
  • pciDeviceID is the PCI device (sometimes called slot) identifier of the device.
  • pciDomainID is the PCI domain identifier of the device.
  • tccDriver is 1 if the device is using a TCC driver or 0 if not.
  • asyncEngineCount is 1 when the device can concurrently copy memory between host and device while executing a kernel. It is 2 when the device can concurrently copy memory between host and device in both directions and execute a kernel at the same time. It is 0 if neither of these is supported.
  • unifiedAddressing is 1 if the device shares a unified address space with the host and 0 otherwise.
  • memoryClockRate is the peak memory clock frequency in kilohertz.
  • memoryBusWidth is the memory bus width in bits.
  • l2CacheSize is L2 cache size in bytes.
  • maxThreadsPerMultiProcessor is the number of maximum resident threads per multiprocessor.
  • streamPrioritiesSupported is 1 if the device supports stream priorities, or 0 if it is not supported.
  • globalL1CacheSupported is 1 if the device supports caching of globals in L1 cache, or 0 if it is not supported.
  • localL1CacheSupported is 1 if the device supports caching of locals in L1 cache, or 0 if it is not supported.
  • sharedMemPerMultiprocessor is the maximum amount of shared memory available to a multiprocessor in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
  • regsPerMultiprocessor is the maximum number of 32-bit registers available to a multiprocessor; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
  • managedMemory is 1 if the device supports allocating managed memory on this system, or 0 if it is not supported.
  • isMultiGpuBoard is 1 if the device is on a multi-GPU board (e.g. Gemini cards), and 0 if not;
  • multiGpuBoardGroupID is a unique identifier for a group of devices associated with the same board. Devices on the same multi-GPU board will share the same identifier;
  • singleToDoublePrecisionPerfRatio is the ratio of single precision performance (in floating-point operations per second) to double precision performance.
  • pageableMemoryAccess is 1 if the device supports coherently accessing pageable memory without calling cudaHostRegister on it, and 0 otherwise.
  • concurrentManagedAccess is 1 if the device can coherently access managed memory concurrently with the CPU, and 0 otherwise.
    [中]返回有关计算设备的信息
cudaError_t cudaGetDeviceProperties ( 
cudaDeviceProp* prop, 
int  device )

返回有关计算设备的信息。在*prop中返回设备开发的属性。cudaDeviceProp结构定义为:

struct cudaDeviceProp { 
char name[256]; 
size_t totalGlobalMem; 
size_t sharedMemPerBlock; 
int regsPerBlock; 
int warpSize; 
size_t memPitch; 
int maxThreadsPerBlock; 
int maxThreadsDim[3]; 
int maxGridSize[3]; 
int clockRate; 
size_t totalConstMem; 
int major; 
int minor; 
size_t textureAlignment; 
size_t texturePitchAlignment; 
int deviceOverlap; 
int multiProcessorCount; 
int kernelExecTimeoutEnabled; 
int integrated; 
int canMapHostMemory; 
int computeMode; 
int maxTexture1D; 
int maxTexture1DMipmap; 
int maxTexture1DLinear; 
int maxTexture2D[2]; 
int maxTexture2DMipmap[2]; 
int maxTexture2DLinear[3]; 
int maxTexture2DGather[2]; 
int maxTexture3D[3]; 
int maxTextureCubemap; 
int maxTexture1DLayered[2]; 
int maxTexture2DLayered[3]; 
int maxTextureCubemapLayered[2]; 
int maxSurface1D; 
int maxSurface2D[2]; 
int maxSurface3D[3]; 
int maxSurface1DLayered[2]; 
int maxSurface2DLayered[3]; 
int maxSurfaceCubemap; 
int maxSurfaceCubemapLayered[2]; 
size_t surfaceAlignment; 
int concurrentKernels; 
int ECCEnabled; 
int pciBusID; 
int pciDeviceID; 
int pciDomainID; 
int tccDriver; 
int asyncEngineCount; 
int unifiedAddressing; 
int memoryClockRate; 
int memoryBusWidth; 
int l2CacheSize; 
int maxThreadsPerMultiProcessor; 
}

其中:
*名称[256]是标识设备的ASCII字符串;
*totalGlobalMem是设备上可用的全局内存总量(字节);
*SharedTemperBlock是线程块可用的最大共享内存量(字节);该数量由同时驻留在多处理器上的所有线程块共享;
*regsPerBlock是线程块可用的最大32位寄存器数;此编号由同时驻留在多处理器上的所有线程块共享;
*warpSize是线程中的扭曲大小;
*memPitch是内存复制函数允许的最大间距(以字节为单位),该函数涉及通过cudamallotch()分配的内存区域;
*maxThreadsPerBlock是每个块的最大线程数;
*maxThreadsDim[3]包含块的每个维度的最大大小;
*maxGridSize[3]包含网格每个维度的最大大小;
*时钟频率是以千赫兹为单位的时钟频率;
*totalConstMem是设备上可用的恒定内存总量(字节);
*major、minor是定义设备计算能力的主要版本号和次要版本号;
*纹理对准是对准要求;与textureAlignment字节对齐的纹理基址不需要应用于纹理回迁的偏移量;
*TextureElectionAlignment是绑定到倾斜内存的2D纹理参考的倾斜对齐要求;
*如果设备可以在执行内核时同时在主机和设备之间复制内存,deviceOverlap为1;如果不能,则为0。已弃用,请改用asyncEngineCount。
*multiProcessorCount是设备上的多处理器数量;
*如果设备上执行的内核有运行时间限制,则kernelExecTimeoutEnabled为1,否则为0。
*如果设备是集成(主板)GPU,则integrated为1;如果是分立(卡)组件,则integrated为0。
*如果设备可以将主机内存映射到CUDA地址空间以与cudaHostAlloc()/CUDAHOSTGETDEVIECEPOINTER()一起使用,则canMapHostMemory为1,否则为0;
*computeMode是设备当前所处的计算模式。可用模式如下:
*CUDACOMPUTEMODEFAULT:默认模式-设备不受限制,多个线程可以将cudaSetDevice()用于此设备。
*cudaComputeModeExclusive:Compute exclusive模式-只有一个线程能够将cudaSetDevice()用于此设备。
*cudaComputeModeProhibited:计算禁止模式-任何线程都不能将cudaSetDevice()用于此设备。
*cudaComputeModeExclusiveProcess:Compute ExclusiveProcess模式-一个进程中的多个线程将能够在此设备上使用cudaSetDevice()。
如果在computeMode CudComputeModeExclusive的已占用设备上调用cudaSetDevice(),则将立即返回CuDaErrorDeviceReadyUse,指示该设备无法使用。使用cudaSetDevice选择占用的独占模式设备时,所有后续的非设备管理运行时函数将返回cudaErrorDevicesUnavailable。
*maxTexture1D是最大1D纹理大小。
*maxTexture1DMipmap是最大1D MIPMAP纹理大小。
*maxTexture1DLinear是绑定到线性内存的纹理的最大1D纹理大小。
*maxTexture2D[2]包含最大2D纹理尺寸。
*maxTexture2DMipmap[2]包含最大2D mipmapped纹理尺寸。
*maxTexture2DLinear[3]包含绑定到基音线性内存的2D纹理的最大2D纹理尺寸。
*如果必须执行纹理聚集操作,则maxTexture2DGather[2]包含最大2D纹理尺寸。
*maxTexture3D[3]包含最大3D纹理尺寸。
*maxTextureCubemap是最大立方体贴图纹理宽度或高度。
*maxTexture1DLayered[2]包含最大1D分层纹理尺寸。
*maxTexture2DLayered[3]包含最大2D分层纹理尺寸。
*maxTextureCubemapLayered[2]包含最大立方体贴图分层纹理尺寸。
*maxSurface1D是最大1D曲面大小。
*maxSurface2D[2]包含最大二维曲面尺寸。
*maxSurface3D[3]包含最大三维曲面尺寸。
*maxSurface1DLayered[2]包含最大1D分层表面尺寸。
*maxSurface2DLayered[3]包含最大二维分层曲面尺寸。
*maxSurfaceCubemap是立方体贴图曲面的最大宽度或高度。
*maxSurfaceCubemapLayered[2]包含最大立方体贴图分层曲面尺寸。
*曲面对齐指定曲面的对齐要求。
*如果设备支持在同一上下文中同时执行多个内核,则concurrentKernels为1,否则为0。不能保证多个内核同时驻留在设备上,因此不应依赖此功能来确保正确性;
*如果设备已启用ECC支持,则ECCSenabled为1,否则为0。
*pciBusID是设备的PCI总线标识符。
*pciDeviceID是设备的PCI设备(有时称为插槽)标识符。
*pciDomainID是设备的PCI域标识符。
*如果设备正在使用变矩器离合器驱动器,则变矩器离合器驱动器为1,否则为0。
*当设备在执行内核时可以在主机和设备之间并发复制内存时,asyncEngineCount为1。当设备可以同时在主机和设备之间双向复制内存并同时执行内核时,为2。如果两者都不受支持,则为0。
*如果设备与主机共享统一地址空间,则UnifiedAddress为1,否则为0。
*memoryClockRate是峰值内存时钟频率,单位为千赫兹。
*memoryBusWidth是以位为单位的内存总线宽度。
*l2CacheSize是以字节为单位的二级缓存大小。
*maxThreadsPerMultiProcessor是每个多处理器的最大驻留线程数。
*如果设备支持流优先级,则streamPrioritiesSupported为1;如果设备不支持流优先级,则为0。
*如果设备支持在一级缓存中缓存全局缓存,则globalL1CacheSupported为1;如果不支持,则为0。
*如果设备支持在一级缓存中缓存本地缓存,则Local1CacheSupported为1,如果不支持,则为0。
*SharedTemperMultiProcessor是多处理器可用的最大共享内存量(字节);该数量由同时驻留在多处理器上的所有线程块共享;
*regsPerMultiprocessor是多处理器可用的最大32位寄存器数;此编号由同时驻留在多处理器上的所有线程块共享;
*如果设备支持在此系统上分配托管内存,则managedMemory为1;如果不支持,则为0。
*如果设备位于多GPU板(如双子卡)上,isMultiGpuBoard为1,否则为0;
*multiGpuBoardGroupID是与同一电路板关联的一组设备的唯一标识符。同一块多GPU板上的设备将共享相同的标识符;
*singleToDoublePrecisionPerfRatio是单精度性能(以每秒浮点运算为单位)与双精度性能的比率。
*pageableMemoryAccess是1,如果设备支持一致访问可分页内存而不调用其上的CudaHorstRegister,则为0,否则为0。
*concurrentManagedAccess是1,如果设备可以与CPU并发一致地访问托管内存,则为0,否则为0。

代码示例

代码示例来源:origin: com.simiacryptus/mindseye

/**
 * Gets device properties.
 *
 * @param device the device
 * @return the device properties
 */
public static cudaDeviceProp getDeviceProperties(final int device) {
 return propertyCache.computeIfAbsent(device, deviceId -> {
  long startTime = System.nanoTime();
  @Nonnull final cudaDeviceProp deviceProp = new cudaDeviceProp();
  final int result = JCuda.cudaGetDeviceProperties(deviceProp, device);
  getDeviceProperties_execution.accept((System.nanoTime() - startTime) / 1e9);
  log("cudaGetDeviceProperties", result, new Object[]{deviceProp, device});
  return deviceProp;
 });
}

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

/**
 * Gets device properties.
 *
 * @param device the device
 * @return the device properties
 */
public static cudaDeviceProp getDeviceProperties(final int device) {
 return propertyCache.computeIfAbsent(device, deviceId -> {
  long startTime = System.nanoTime();
  @Nonnull final cudaDeviceProp deviceProp = new cudaDeviceProp();
  final int result = JCuda.cudaGetDeviceProperties(deviceProp, device);
  getDeviceProperties_execution.accept((System.nanoTime() - startTime) / 1e9);
  log("cudaGetDeviceProperties", result, new Object[]{deviceProp, device});
  return deviceProp;
 });
}

代码示例来源:origin: com.simiacryptus/mindseye-cudnn

IntStream.range(0, deviceCount[0]).forEach(device -> {
 @Nonnull final cudaDeviceProp deviceProp = new cudaDeviceProp();
 JCuda.cudaGetDeviceProperties(deviceProp, device);
 out.printf("Device %d = %s%n", device, deviceProp, free[0], total[0]);
});

代码示例来源:origin: com.simiacryptus/mindseye

IntStream.range(0, deviceCount[0]).forEach(device -> {
 @Nonnull final cudaDeviceProp deviceProp = new cudaDeviceProp();
 JCuda.cudaGetDeviceProperties(deviceProp, device);
 out.printf("Device %d = %s%n", device, deviceProp, free[0], total[0]);
});

代码示例来源:origin: org.nd4j/nd4j-jcublas-common

/**
 * Initialize JCublas2. Only called once
 */
public static void init() {
  if (init)
    return;
  JCublas2.setExceptionsEnabled(true);
  JCudaDriver.setExceptionsEnabled(true);
  JCuda.setExceptionsEnabled(true);
  try {
    KernelFunctionLoader.getInstance().load();
  } catch (Exception e) {
    throw new RuntimeException(e);
  }
  // Check if the device supports mapped host memory
  cudaDeviceProp deviceProperties = new cudaDeviceProp();
  JCuda.cudaGetDeviceProperties(deviceProperties, 0);
  if (deviceProperties.canMapHostMemory == 0) {
    System.err.println("This device can not map host memory");
    System.err.println(deviceProperties.toFormattedString());
    return;
  }
  init = true;
}

相关文章

JCuda类方法