rk3399使用阿里推理引擎MNN使用cpu和gpu进行benchmark,OpenCL效果不佳?

在这里插入图片描述

视频讲解

rk3399使用阿里推理引擎MNN使用cpu和gpu进行benchmark,OpenCL效果不佳?

背景

MNN是阿里开源的推理引擎,今天测试一下在rk3399平台上的benchmark怎么样?
alibaba/MNN: MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba (github.com)

首先git clone

git clone git@github.com:alibaba/MNN.git

创建build目录

cd MNN
mkdir build
cd build

cmake配置

注意交叉编译器以及opencl库的使用方式,是使用系统opencl库还是使用wrap进行dlopen加载

cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DMNN_BUILD_DEMO=ON \
-DMNN_BUILD_BENCHMARK=true \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_VERSION=1 \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DMNN_OPENCL=ON \
-DMNN_USE_SYSTEM_LIB=ON \
-DCMAKE_C_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-g++

make -j32

部署

然后将build目录下的libMNN.so以及benchmark.out和上级目录下的benchmark的model放到一起,同时libMNN.so需要放到rk3399的lib目录下

sudo cp libMNN.so /lib
sudo cp -rf ../benchmark/model .

然后运行benchmark测试,第二个参数:loop测试次数,第4个参数:0代表使用cpu,3代表使用opencl

cpu测试

firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 0clear
MNN benchmark
Forward type: CPU thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] SqueezeNetV1.0.mnn          max =   86.128 ms  min =   86.128 ms  avg =   86.128 ms
[ - ] MobileNetV2_224.mnn         max =   42.041 ms  min =   42.041 ms  avg =   42.041 ms
[ - ] inception-v3.mnn            max =  505.111 ms  min =  505.111 ms  avg =  505.111 ms
[ - ] mobilenetV3.mnn             max =   13.533 ms  min =   13.533 ms  avg =   13.533 ms
[ - ] nasnet.mnn                  max =  145.489 ms  min =  145.489 ms  avg =  145.489 ms
[ - ] mobilenet-v1-1.0.mnn        max =   66.624 ms  min =   66.624 ms  avg =   66.624 ms
[ - ] squeezenetv1.1.mnn          max =   40.437 ms  min =   40.437 ms  avg =   40.437 ms
[ - ] resnet-v2-50.mnn            max =  308.836 ms  min =  308.836 ms  avg =  308.836 ms

gpu测试

firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
arm_release_ver of this libmali is 'r18p0-01rel0', rk_so_ver is '4'.[ - ] SqueezeNetV1.0.mnn          max =  159.619 ms  min =  159.619 ms  avg =  159.619 ms
[ - ] MobileNetV2_224.mnn         max =  126.671 ms  min =  126.671 ms  avg =  126.671 ms
[ - ] inception-v3.mnn            max =  800.436 ms  min =  800.436 ms  avg =  800.436 ms
[ - ] mobilenetV3.mnn             max =   61.661 ms  min =   61.661 ms  avg =   61.661 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] nasnet.mnn                  max =  140.189 ms  min =  140.189 ms  avg =  140.189 ms
[ - ] mobilenet-v1-1.0.mnn        max =   98.918 ms  min =   98.918 ms  avg =   98.918 ms
[ - ] squeezenetv1.1.mnn          max =  121.158 ms  min =  121.158 ms  avg =  121.158 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] resnet-v2-50.mnn            max =  428.075 ms  min =  428.075 ms  avg =  428.075 ms

结论

可以看到,gpu使用上很慢且存在算子的问题,实际上在rk3568上测试opencl很流畅且没有问题,这里留下问题,之后探究

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-10 15:20:01       98 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-10 15:20:01       106 阅读
  3. 在Django里面运行非项目文件

    2024-03-10 15:20:01       87 阅读
  4. Python语言-面向对象

    2024-03-10 15:20:01       96 阅读

热门阅读

  1. 【学习心得】webpack技术在爬虫逆向中的应用

    2024-03-10 15:20:01       43 阅读
  2. 如何学习ChatGPT?从入门到精通(附资料下载)

    2024-03-10 15:20:01       126 阅读
  3. EF框架常见异常处理汇总

    2024-03-10 15:20:01       46 阅读
  4. L1-095 分寝室(PTA)

    2024-03-10 15:20:01       32 阅读
  5. 计网|谢希仁版|第一章课后习题

    2024-03-10 15:20:01       38 阅读
  6. 人工智能、深度学习、机器学习书目推荐

    2024-03-10 15:20:01       46 阅读
  7. C++ 疑难点

    2024-03-10 15:20:01       37 阅读
  8. STM32 (1)

    STM32 (1)

    2024-03-10 15:20:01      52 阅读
  9. 一次压测经验过程的经验记录

    2024-03-10 15:20:01       46 阅读
  10. Hudi小文件压缩

    2024-03-10 15:20:01       43 阅读
  11. php 面试题目

    2024-03-10 15:20:01       44 阅读
  12. git fetch 合并远程仓库到本地

    2024-03-10 15:20:01       52 阅读
  13. 大数据开发(Hadoop面试真题-卷五)

    2024-03-10 15:20:01       41 阅读