Several blocks can be processed by the same multiprocessor concurrently by allocating the multiprocessor’s registers and shared memory among the blocks. More precisely, the number of registers available per thread is equal to:
N_registersPerMultiprocessor / CEIL(N_concurrentBlocks*N_threadsPerBlock, 64)
where N_registersPerMultiprocessor is the total number of registers per multiprocessor, N_concurrentBlocks is the number of concurrent blocks, N_threadsPerBlock is the number of threads per block, and CEIL(X, 64) means rounded up to the nearest multiple of 64.
原文链接: