博客專欄

        EEPW首頁 > 博客 > iMX8MPlus和iMX8QM機器學習框架eIQ性能對比

        iMX8MPlus和iMX8QM機器學習框架eIQ性能對比

        發布人:toradex 時間:2021-07-16 來源:工程師 發布文章

        By Toradex 胡珊逢

        機器學習算法對算力要求較高,通常會采用 GPU ,或者專用的處理器如 NPU 進行加速運算。NXP 先后推出的兩款處理器iMX8QuadMax   iMX8M Plus 分別可以采用 GPU  NPU 對常用的機器學習算法例如 TensorFlow Lite 等進行加速。文章將使用 NXP eIQ 框架在兩個處理器上測試不同算法的性能。

         

        這里我們將使用 Toradex  Apalis iMX8QM 4GB WB IT V1.1C  Verdin iMX8M Plus Quad 4GB WB IT V1.0B 兩個模塊。BSP  Linux BSP V5.3 eIQ 采用 zeus-5.4.70-2.3.3 版本。Toradex 默認 Yocto Project 編譯環境并沒有直接集成  eIQ 軟件可以參考這里添加 meta-ml layer 并進行編譯。然后修改  meta-ml/recipes-devtools/python/python3-pybind11_2.5.0.bb 中的Python 版本為 3.8 。最后可以生成  multimedia image

        -------------------------------------

        EXTRA_OECMAKE = "-DPYBIND11_TEST=OFF \
        -DPYTHON_EXECUTABLE=${RECIPE_SYSROOT_NATIVE}/usr/bin/python3-native/python3.8 \ "

        -------------------------------------

         

        使用 Toradex Easy Installer 將生成的鏡像安裝到  Apalis iMX8QM 4GB WB IT V1.1C  Verdin iMX8M Plus Quad 4GB WB IT V1.0B 兩個模塊上。

         

        測試的內容參考 NXP  i.MX_Machine_Learning_User's_Guide 文檔進行包括 TensorFlow LiteArm NNONNXPyTorch。由于目前  OpenCV 還只能運行在 iMX8QuadMax   iMX8M Plus  CPU 無法使用 GPU 或者 NPU 加速所以本次不做測試。另外在使用 Arm NN 測試 Caffe 模型時有兩個限制。第一batch size 必須為 1。例如  deploy.prototxt 文件修改為

        -------------------------------------

        name: "AlexNet"

        layer {

          name: "data"

          type: "Input"

          top: "data"

          input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }

        }

        -------------------------------------

         

        第二 Arm NN 不支持所有的 Caffe 語法一些老的神經網絡模型文件需要更新到最新的  Caffe 語法。下面是 PC 上用于轉換的 Python3 腳本。

        -------------------------------------

        import caffe

        net = caffe.Net('lenet.prototxt', 'lenet_iter_9000-orignal.caffemodel', caffe.TEST)

        net.save('lenet_iter_9000.caffemodel')

        -------------------------------------

         

        在兩個模塊上測試結果如下。

         

        TensorFlow Lite

        Apalis iMX8QM

        label_image

        -------------------------------------

        root@apalis-imx8:/usr/bin/tensorflow-lite-2.4.0/examples# USE_GPU_INFERENCE=1 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -a 1

        INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
        INFO: resolved reporter
        INFO: Created TensorFlow Lite delegate for NNAPI.
        INFO: Applied NNAPI delegate.
        INFO: invoked
        INFO: average time: 12.407 ms
        INFO: 0.784314: 653 military uniform
        INFO: 0.105882: 907 Windsor tie
        INFO: 0.0156863: 458 bow tie
        INFO: 0.0117647: 466 bulletproof vest
        INFO: 0.00784314: 668 mortarboard

        -------------------------------------

         

        benchmark_model

        -------------------------------------

        root@apalis-imx8:/usr/bin/tensorflow-lite-2.4.0/examples# ./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --use_nnapi=true

        STARTING!

        Log parameter values verbosely: [0]
        Graph: [mobilenet_v1_1.0_224_quant.tflite]
        Use NNAPI: [1]
        NNAPI accelerators available: [vsi-npu]
        Loaded model mobilenet_v1_1.0_224_quant.tflite
        INFO: Created TensorFlow Lite delegate for NNAPI.
        Explicitly applied NNAPI delegate, and the model graph will be completely executed by the
        delegate.
        The input model file size (MB): 4.27635
        Initialized session in 16.746ms.
        Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150
        seconds.
        count=17 first=305296 curr=12471 min=12299 max=305296 avg=29650 std=68911
        Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
        count=81 first=12417 curr=12430 min=12294 max=12511 avg=12405.6 std=39
        Inference timings in us: Init: 16746, First inference: 305296, Warmup (avg): 29650, Inference (avg): 12405.6
        Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
        Peak memory footprint (MB): init=1.85938 overall=55.1406

        -------------------------------------

         

        Verdin iMX8M Plus

        label_image

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/tensorflow-lite-2.4.0/examples# USE_GPU_INFERENCE=0 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -a 1
        INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
        INFO: resolved reporter
        INFO: Created TensorFlow Lite delegate for NNAPI.
        INFO: Applied NNAPI delegate.
        INFO: invoked
        INFO: average time: 2.835 ms
        INFO: 0.768627: 653 military uniform
        INFO: 0.105882: 907 Windsor tie
        INFO: 0.0196078: 458 bow tie
        INFO: 0.0117647: 466 bulletproof vestINFO: 0.00784314: 835 suit

        -------------------------------------

         

        benchmark_model

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/tensorflow-lite-2.4.0/examples# ./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --use_nnapi=true
        STARTING!
        Log parameter values verbosely: [0]
        Graph: [mobilenet_v1_1.0_224_quant.tflite]
        Use NNAPI: [1]
        NNAPI accelerators available: [vsi-npu]
        Loaded model mobilenet_v1_1.0_224_quant.tflite
        INFO: Created TensorFlow Lite delegate for NNAPI.
        Explicitly applied NNAPI delegate, and the model graph will be completely executed by the delegate.
        The input model file size (MB): 4.27635
        Initialized session in 16.79ms.
        Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
        count=1 curr=6664535
        Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
        count=367 first=2734 curr=2646 min=2624 max=2734 avg=2650.05 std=16
        Inference timings in us: Init: 16790, First inference: 6664535, Warmup (avg): 6.66454e+06, Inference (avg): 2650.05
        Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
        Peak memory footprint (MB): init=1.79297 overall=28.5117 

        -------------------------------------

         

         

        Arm NN

        Apalis iMX8QM

        CaffeAlexNet-Armnn

        -------------------------------------

        root@apalis-imx8:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeAlexNet-Armnn --data-dir=data --model-dir=models
        Info: ArmNN v22.0.0
        Info: Initialization time: 0.14 ms
        Info: Network parsing time: 1397.76 ms
        Info: Optimization time: 195.13 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 2 with value: 0.706226
        Info: Top(2) prediction is 0 with value: 1.26573e-05
        Info: Total time for 1 test cases: 0.264 seconds
        Info: Average time per test case: 263.701 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 56.83 ms

        -------------------------------------

         

        CaffeMnist-Armnn

        -------------------------------------

        root@apalis-imx8:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeMnist-Armnn --data-dir=data --model-dir=models

        Info: ArmNN v22.0.0
        Info: Initialization time: 0.09 ms
        Info: Network parsing time: 8.70 ms
        Info: Optimization time: 2.67 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 7 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #1
        Info: Top(1) prediction is 2 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #5
        Info: Top(1) prediction is 1 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #8
        Info: Top(1) prediction is 5 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #9
        Info: Top(1) prediction is 9 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: Total time for 5 test cases: 0.015 seconds
        Info: Average time per test case: 2.927 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 1.56 ms 

        -------------------------------------

         

        CaffeVGG-Armnn

        -------------------------------------

        root@apalis-imx8:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeVGG-Armnn --data-dir=data --model-dir=models

        Info: ArmNN v22.0.0
        Info: Initialization time: 0.08 ms
        Info: Network parsing time: 1452.35 ms
        Info: Optimization time: 491.98 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 2 with value: 0.692014
        Info: Top(2) prediction is 0 with value: 9.80887e-07
        Info: Total time for 1 test cases: 2.723 seconds
        Info: Average time per test case: 2722.846 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 115.74 ms

        -------------------------------------

         

        Verdin iMX8M Plus

        CaffeAlexNet-Armnn

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeAlexNet-Armnn --data-dir=data --model-dir=models

        Info: ArmNN v22.0.0
        Info: Initialization time: 0.12 ms
        Info: Network parsing time: 1250.55 ms
        Info: Optimization time: 141.40 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 2 with value: 0.706225
        Info: Top(2) prediction is 0 with value: 1.26573e-05
        Info: Total time for 1 test cases: 0.110 seconds
        Info: Average time per test case: 110.124 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 15.04 ms

        -------------------------------------

         

        CaffeMnist-Armnn

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeMnist-Armnn --data-dir=data --model-dir=models

        Info: ArmNN v22.0.0
        Info: Initialization time: 0.11 ms
        Info: Network parsing time: 8.96 ms
        Info: Optimization time: 3.01 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 7 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #1
        Info: Top(1) prediction is 2 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #5
        Info: Top(1) prediction is 1 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #8
        Info: Top(1) prediction is 5 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: = Prediction values for test #9
        Info: Top(1) prediction is 9 with value: 1
        Info: Top(2) prediction is 0 with value: 0
        Info: Total time for 5 test cases: 0.008 seconds
        Info: Average time per test case: 1.608 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 1.69 ms 

        -------------------------------------

         

        CaffeVGG-Armnn

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/armnn-20.08/ArmnnTests# ../CaffeVGG-Armnn --data-dir=data --model-dir=modelsInfo: ArmNN v22.0.0

        Info: Initialization time: 0.15 ms
        Info: Network parsing time: 2842.95 ms
        Info: Optimization time: 316.74 ms
        Info: = Prediction values for test #0
        Info: Top(1) prediction is 2 with value: 0.692015
        Info: Top(2) prediction is 0 with value: 9.8088e-07
        Info: Total time for 1 test cases: 1.098 seconds
        Info: Average time per test case: 1097.593 ms
        Info: Overall accuracy: 1.000
        Info: Shutdown time: 130.65 ms 

        -------------------------------------

         

         

        ONNX

        Apalis iMX8QM

        onnx_test_runner

        -------------------------------------

        root@apalis-imx8:~# time onnx_test_runner -j 1 -c 1 -r 1 -e vsi_npu ./mobilenetv2-7/

        result:  
        Models: 1
        Total test cases: 3
         Succeeded: 3
         Not implemented: 0
         Failed: 0
        Stats by Operator type:
         Not implemented(0):  
         Failed:
        Failed Test Cases:
         
        real 0m0.643s
        user 0m1.513s
        sys 0m0.111s

        -------------------------------------

         

        Verdin iMX8M Plus

        onnx_test_runner

        -------------------------------------

        root@verdin-imx8mp:~# time onnx_test_runner -j 1 -c 1 -r 1 -e vsi_npu ./mobilenetv2-7/

        result:  
        Models: 1
        Total test cases: 3
         Succeeded: 3
         Not implemented: 0
         Failed: 0
        Stats by Operator type:
         Not implemented(0):  
         Failed:
        Failed Test Cases:
         
        real 0m0.663s
        user 0m1.195s
        sys 0m0.073s 

        -------------------------------------

         

         

        PyTorch

        Apalis iMX8QM

        pytorch_mobilenetv2.py

        -------------------------------------

        root@apalis-imx8:/usr/bin/pytorch/examples# time python3 pytorch_mobilenetv2.py

        ('tabby, tabby cat', 46.348018646240234)
        ('tiger cat', 35.17843246459961)
        ('Egyptian cat', 15.802857398986816)
        ('lynx, catamount', 1.161122441291809)
        ('tiger, Panthera tigris', 0.20774582028388977)
         
        real 0m8.806s
        user 0m7.440s
        sys 0m0.593s 

        -------------------------------------

         

        Verdin iMX8M Plus

        pytorch_mobilenetv2.py

        -------------------------------------

        root@verdin-imx8mp:/usr/bin/pytorch/examples# time python3 pytorch_mobilenetv2.py

        ('tabby, tabby cat', 46.348018646240234)
        ('tiger cat', 35.17843246459961)
        ('Egyptian cat', 15.802857398986816)
        ('lynx, catamount', 1.161122441291809)
        ('tiger, Panthera tigris', 0.20774582028388977)
         
        real 0m6.313s
        user 0m5.933s
        sys 0m0.295s 

        -------------------------------------

         

         

        匯總對比

        iMX8MPlus和iMX8QM機器學習框架eIQ性能對比_web12949.png 

        根據具體測試應用不同,兩者之間的性能差距大小不一。總體來看常用機器學習算法在 Verdin iMX8M Plus NPU 上的表現會優于 Apalis iMX8QM GPU

         

         

        總結

        機器學習是較為復雜的應用,除了硬件處理器外,影響算法性能表現的還包括對模型本身的優化。尤其是對嵌入式系統有限的處理能力來講,直接將 PC 上現成的模型拿過來用通常會表現不佳。同時根據項目需求選擇合適計算機模塊,畢竟 Verdin iMX8M Plus Apalis iMX8QM 的用途側重點不同。


        *博客內容為網友個人發布,僅代表博主個人觀點,如有侵權請聯系工作人員刪除。




        相關推薦

        技術專區

        關閉
        主站蜘蛛池模板: 黎川县| 于田县| 牡丹江市| 子洲县| 伊通| 安多县| 晋江市| 武川县| 鲁山县| 通海县| 黄浦区| 宁津县| 石渠县| 普陀区| 来安县| 清远市| 宣恩县| 岑巩县| 雷山县| 香格里拉县| 乐昌市| 得荣县| 驻马店市| 东阿县| 哈尔滨市| 迭部县| 阳信县| 正定县| 渭南市| 金坛市| 清苑县| 宁明县| 临沭县| 精河县| 南平市| 尼木县| 芜湖县| 东方市| 安宁市| 红桥区| 鄂伦春自治旗|