pytorch 是如何调用 cusolver API 的调用

0,环境

ubuntu 22.04
pytorch 2.3.1
x86
RTX 3080
cuda 12.2

1, 示例代码

以potrs为例;

hello_cholesk.py


""" 
hello_cholesky.py
step1, Cholesky decompose;
step2, inverse A;
step3, Cholesky again;
python3 hello_cholesky.py --size 256  --cuda_device_id  0
"""
import torch
import time
import argparse


def cholesky_measure(A, cuda_dev=0):
    dev = torch.device(f"cuda:{cuda_dev}")
    A = A.to(dev)

    print(f'Which device to compute : {dev}')
  
    SY = 100* torch.mm(A, A.t()) +  200*torch.eye(N, device=dev)

    to_start = time.time() 
    SY = torch.linalg.cholesky(SY)
    SY = torch.cholesky_inverse(SY)
    SY = torch.linalg.cholesky(SY, upper=True)
    run_time = time.time() - to_start   
     
    print(f'The device: {dev}, run: {run_time:.3f} second')
    print(f'SY : {SY}')
    print(f'****'*20)

    return run_time

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='dim of A.')
    parser.add_argument('--N', type=int, default=512, required=True, help='dim of A')
    args = parser.parse_args()
    N = args.N

    print(f'A N : {N}')    
    A = torch.randn(N, N)
       
    cuda_dev = 0
    time_dev0 = cholesky_measure(A, cuda_dev)    
    time_dev1 = cholesky_measure(A, cuda_dev+1)    
    print(f'time_dev0 /time_dev1 = {time_dev0/time_dev1:.2f} ')

运行效果:

2,调用栈跟踪

跟踪如下调用关系:


Tensor cholesky_inverse(const Tensor &input, bool upper)    aten/src/ATen/native/BatchLinearAlgebra.cpp
	static Tensor& cholesky_inverse_out_info(Tensor& result, Tensor& infos, const Tensor& input, bool upper)
	DECLARE_DISPATCH(cholesky_inverse_fn, cholesky_inverse_stub);
	REGISTER_ARCH_DISPATCH(cholesky_inverse_stub, DEFAULT, &cholesky_inverse_kernel_impl);
	Tensor& cholesky_inverse_kernel_impl(Tensor &result, Tensor& infos, bool upper)
	Tensor& cholesky_inverse_kernel_impl_cusolver(Tensor &result, Tensor& infos, bool upper)
	void _cholesky_inverse_cusolver_potrs_based(Tensor& result, Tensor& infos, bool upper)
	template<typename scalar_t>
	inline static void apply_cholesky_cusolver_potrs(Tensor& self_working_copy, const Tensor& A_column_major_copy, bool upper, Tensor& infos)
	at::cuda::solver::potrs<scalar_t>(
      handle, uplo, n_32, nrhs_32,
      A_ptr + i * A_matrix_stride,
      lda_32,
      self_working_copy_ptr + i * self_matrix_stride,
      ldb_32,
      infos_ptr
    );

一些细节:

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-12 10:28:01       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-12 10:28:01       72 阅读
  3. 在Django里面运行非项目文件

    2024-07-12 10:28:01       58 阅读
  4. Python语言-面向对象

    2024-07-12 10:28:01       69 阅读

热门阅读

  1. VSCode中多行文本的快速前后缩进

    2024-07-12 10:28:01       19 阅读
  2. [手机Linux PostmarketOS]三, Alpine Linux命令使用

    2024-07-12 10:28:01       22 阅读
  3. Vscode连接存在私钥的远程服务器

    2024-07-12 10:28:01       25 阅读
  4. leetcode热题100.单词拆分(动态规划进阶)

    2024-07-12 10:28:01       27 阅读
  5. ubuntu文件夹加密

    2024-07-12 10:28:01       23 阅读
  6. OpenCV在构建时确实没有启用CUDA支持

    2024-07-12 10:28:01       20 阅读
  7. 编程题-函数模板

    2024-07-12 10:28:01       22 阅读
  8. Opencv中的直方图均衡

    2024-07-12 10:28:01       20 阅读
  9. cannot connect to X server

    2024-07-12 10:28:01       22 阅读