深度学习模型部署(六)TensorRT工作流and入门demo

TensorRT工作流程

官方给出的步骤:
在这里插入图片描述
总结下来可以分为两大部分:

  • 模型生成:将onnx经过一系列优化,生成tensorrt的engine模型
    • 选择batchsize,选择精度precision,模型转换
  • 模型推理:使用python或者C++进行推理

入门Demo

生成trt模型:

trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.trt
# trtexec是TensorRT自带的工具,如果运行显示is no command,把TensorRT安装路径下的bin文件夹加入到path中然后source一下就行了。

然后就坐等输出模型,我们可以根据log信息看一下tensorRT都干了什么:

 === Model Options ===
 === Build Options ===
 Precision: FP32
 === System Options ===
 === Inference Options ===
 === Reporting Options ===
 # 这几部分是一些选项设置,不用看,目前只需要看精度这一项
 === Device Information ===
 # 设备信息
 [TRT] CUDA lazy loading is not enabled.
 # 这里提到了CUDA lazy loading,这个是CUDA11.8新增的延时加载功能。
 # 初始化时不加载kernel,只有用相应的kernel才会加载,是CUDA层面的特性。
 # 这个特性会导致第一次推理比较慢,因为第一次推理要加载用到的kernel函数
 # 我们后面会先更几篇番外初步速成一下cuda,后面用到cuda的地方会很多
 Start parsing network model.
[03/11/2024-22:37:43] [I] [TRT] ----------------------------------------------------------------
[03/11/2024-22:37:43] [I] [TRT] Input filename:   yolov5s.onnx
[03/11/2024-22:37:43] [I] [TRT] ONNX IR version:  0.0.8
[03/11/2024-22:37:43] [I] [TRT] Opset version:    17
[03/11/2024-22:37:43] [I] [TRT] Producer name:    pytorch
[03/11/2024-22:37:43] [I] [TRT] Producer version: 2.2.1
[03/11/2024-22:37:43] [I] [TRT] Domain:
[03/11/2024-22:37:43] [I] [TRT] Model version:    0
[03/11/2024-22:37:43] [I] [TRT] Doc string:
[03/11/2024-22:37:43] [I] [TRT] ----------------------------------------------------------------
# 解析模型
[TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
# 提醒我们的模型时INT64的,会被压缩到INT32
[TRT] Graph optimization time: 0.021841 seconds.
# 进行图优化
[TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
# 进行图简化/图规约
Using random values for input images
[03/11/2024-22:39:14] [I] Input binding for images with dimensions 1x3x640x640 is created.
[03/11/2024-22:39:14] [I] Output binding for output0 with dimensions 1x25200x85 is created.
[03/11/2024-22:39:14] [I] Starting inference
# 会进行一次推理,tracing数据流过的算子以及时间

得到模型后开始进行部署:

import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
N_CLASSES = 80 # yolov5 class label number
BATCH_SIZE=1
PRECISION= np.float32


dummy_input_batch = np.zeros((BATCH_SIZE,3,640,640),dtype=PRECISION)

f = open("yolov5s.trt", "rb")
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))

engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

output = np.empty(N_CLASSES, dtype = PRECISION) # Need to set both input and output precisions to FP16 to fully enable FP16

d_input = cuda.mem_alloc(1 * dummy_input_batch.nbytes)
d_output = cuda.mem_alloc(1 * output.nbytes)

bindings = [int(d_input), int(d_output)]
stream = cuda.Stream()

def predict(batch): # result gets copied into output
    # Transfer input data to device
    cuda.memcpy_htod_async(d_input, batch, stream)
    # Execute model
    context.execute_async_v2(bindings, stream.handle, None)
    # Transfer predictions back
    cuda.memcpy_dtoh_async(output, d_output, stream)
    # Syncronize threads
    stream.synchronize()
    return output

pred = predict(dummy_input_batch)
print(pred.shape)

今天blog的主题是跑通tensorRT的整个流程,yolov5的后处理比较麻烦,这不是今天blog的主题,所以没有写,后面有空补上。

如果感觉有帮助,点赞收藏+关注!thanks!

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-12 08:38:05       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-12 08:38:05       100 阅读
  3. 在Django里面运行非项目文件

    2024-03-12 08:38:05       82 阅读
  4. Python语言-面向对象

    2024-03-12 08:38:05       91 阅读

热门阅读

  1. C语言学习笔记day2

    2024-03-12 08:38:05       40 阅读
  2. 并发中的锁

    2024-03-12 08:38:05       43 阅读
  3. (C语言)球球大作战

    2024-03-12 08:38:05       36 阅读
  4. R 语言patchwork包拼图间隙

    2024-03-12 08:38:05       43 阅读
  5. 华为机考:HJ2 计算某字符出现次数

    2024-03-12 08:38:05       47 阅读
  6. MFC中字符串string类型和CString类型互转方法

    2024-03-12 08:38:05       38 阅读
  7. AI大语言模型GPT & R 生态环境领域数据统计分析

    2024-03-12 08:38:05       42 阅读
  8. 单调栈的用法

    2024-03-12 08:38:05       46 阅读
  9. 初级爬虫实战——巴黎圣母院新闻

    2024-03-12 08:38:05       41 阅读
  10. 手写redis机制

    2024-03-12 08:38:05       42 阅读
  11. Spring Data访问 MongoDB(十六)----CDI集成

    2024-03-12 08:38:05       41 阅读
  12. 3.11笔记3

    2024-03-12 08:38:05       37 阅读