rk3399使用阿里推理引擎MNN使用cpu和gpu进行benchmark，OpenCL效果不佳？

2024-03-10 15:20:01
开发
38

在这里插入图片描述

视频讲解

背景

MNN是阿里开源的推理引擎，今天测试一下在rk3399平台上的benchmark怎么样？
alibaba/MNN: MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba (github.com)

首先git clone

git clone git@github.com:alibaba/MNN.git

创建build目录

cd MNN
mkdir build
cd build

cmake配置

注意交叉编译器以及opencl库的使用方式，是使用系统opencl库还是使用wrap进行dlopen加载

cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DMNN_BUILD_DEMO=ON \
-DMNN_BUILD_BENCHMARK=true \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_VERSION=1 \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DMNN_OPENCL=ON \
-DMNN_USE_SYSTEM_LIB=ON \
-DCMAKE_C_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-g++

make -j32

部署

然后将build目录下的libMNN.so以及benchmark.out和上级目录下的benchmark的model放到一起，同时libMNN.so需要放到rk3399的lib目录下

sudo cp libMNN.so /lib
sudo cp -rf ../benchmark/model .

然后运行benchmark测试，第二个参数：loop测试次数，第4个参数：0代表使用cpu，3代表使用opencl

cpu测试

firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 0clear
MNN benchmark
Forward type: CPU thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] SqueezeNetV1.0.mnn          max =   86.128 ms  min =   86.128 ms  avg =   86.128 ms
[ - ] MobileNetV2_224.mnn         max =   42.041 ms  min =   42.041 ms  avg =   42.041 ms
[ - ] inception-v3.mnn            max =  505.111 ms  min =  505.111 ms  avg =  505.111 ms
[ - ] mobilenetV3.mnn             max =   13.533 ms  min =   13.533 ms  avg =   13.533 ms
[ - ] nasnet.mnn                  max =  145.489 ms  min =  145.489 ms  avg =  145.489 ms
[ - ] mobilenet-v1-1.0.mnn        max =   66.624 ms  min =   66.624 ms  avg =   66.624 ms
[ - ] squeezenetv1.1.mnn          max =   40.437 ms  min =   40.437 ms  avg =   40.437 ms
[ - ] resnet-v2-50.mnn            max =  308.836 ms  min =  308.836 ms  avg =  308.836 ms

gpu测试

firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
arm_release_ver of this libmali is 'r18p0-01rel0', rk_so_ver is '4'.[ - ] SqueezeNetV1.0.mnn          max =  159.619 ms  min =  159.619 ms  avg =  159.619 ms
[ - ] MobileNetV2_224.mnn         max =  126.671 ms  min =  126.671 ms  avg =  126.671 ms
[ - ] inception-v3.mnn            max =  800.436 ms  min =  800.436 ms  avg =  800.436 ms
[ - ] mobilenetV3.mnn             max =   61.661 ms  min =   61.661 ms  avg =   61.661 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] nasnet.mnn                  max =  140.189 ms  min =  140.189 ms  avg =  140.189 ms
[ - ] mobilenet-v1-1.0.mnn        max =   98.918 ms  min =   98.918 ms  avg =   98.918 ms
[ - ] squeezenetv1.1.mnn          max =  121.158 ms  min =  121.158 ms  avg =  121.158 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] resnet-v2-50.mnn            max =  428.075 ms  min =  428.075 ms  avg =  428.075 ms

结论

可以看到，gpu使用上很慢且存在算子的问题，实际上在rk3568上测试opencl很流畅且没有问题，这里留下问题，之后探究

原文地址:https://blog.csdn.net/weixin_38428827/article/details/136598668 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1766725681942761472.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部