cuda 如何计算每个线程分配了多少寄存器

Several blocks can be processed by the same multiprocessor concurrently by allocating the multiprocessor’s registers and shared memory among the blocks. More precisely, the number of registers available per thread is equal to:

N_registersPerMultiprocessor / CEIL(N_concurrentBlocks*N_threadsPerBlock, 64)

where N_registersPerMultiprocessor is the total number of registers per multiprocessor, N_concurrentBlocks is the number of concurrent blocks, N_threadsPerBlock is the number of threads per block, and CEIL(X, 64) means rounded up to the nearest multiple of 64.

原文链接:

Registers per thread limit and occupancy - CUDA / CUDA Programming and Performance - NVIDIA Developer Forums

相关推荐

  1. cuda 如何计算每个线分配多少寄存器

    2024-06-17 16:56:06       34 阅读
  2. 如何创建线

    2024-06-17 16:56:06       49 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-06-17 16:56:06       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-06-17 16:56:06       100 阅读
  3. 在Django里面运行非项目文件

    2024-06-17 16:56:06       82 阅读
  4. Python语言-面向对象

    2024-06-17 16:56:06       91 阅读

热门阅读

  1. Redisson-获取连接原理

    2024-06-17 16:56:06       24 阅读
  2. 从零开始做ROS机器人

    2024-06-17 16:56:06       29 阅读
  3. MySQL触发器基本结构

    2024-06-17 16:56:06       36 阅读
  4. js如何添加新元素到数组中

    2024-06-17 16:56:06       24 阅读
  5. HTML中的文本标签:微观排版的艺术

    2024-06-17 16:56:06       26 阅读
  6. python项目发布Docker Harbor

    2024-06-17 16:56:06       32 阅读
  7. 军用FPGA软件 Verilog语言的编码准测之时钟

    2024-06-17 16:56:06       31 阅读