TFLite的CoreMLDelegate可以在iOS中同时使用GPU和CPU吗?

问题描述

我已经在我的应用程序中成功使用了tflite的MetalDelegate。当我切换到CoreMLDelegate时,它完全在cpu上运行我的(浮动)tflite模型(MobileNet),显示GPU使用为0。我在兼容的iPhone 11MaxPro上运行此程序。在初始化期间,我注意到以下行: “ CoreML委托:在31个节点中委托了29个节点,带有2个分区”。 有什么想法吗?如何使CoreMLDelegate在iOS上同时使用GPU和cpu?我从here下载了mobilenet_v1_1.0_224.tflite模型文件

enter image description here

import AVFoundation
import UIKit
import SpriteKit
import Metal

var device: MTLDevice!
var commandQueue: MTLCommandQueue!

private var total_latency:Double = 0
private var total_count:Double = 0
private var sstart = TimeInterval(NSDate().timeIntervalSince1970)

class ViewController: UIViewController {
...
}

// MARK: CameraFeedManagerDelegate Methods
extension ViewController: CameraFeedManagerDelegate {

  func didOutput(pixelBuffer: CVPixelBuffer) {
    let currentTimeMs = Date().timeIntervalSince1970 * 1
    guard (currentTimeMs - prevIoUsInferenceTimeMs) >= delayBetweenInferencesMs else { return }
    prevIoUsInferenceTimeMs = currentTimeMs



    //  1. First create the Metal device and command queue in viewDidLoad():
    device = MTLCreateSystemDefaultDevice()
    commandQueue = device.makeCommandQueue()

    var timestamp = NSDate().timeIntervalSince1970
    let start = TimeInterval(timestamp)

    // 2. Access the shared MTLCaptureManager and start capturing
    let capManager = MTLCaptureManager.shared()
    let myCaptureScope = capManager.makeCaptureScope(device: device)
    myCaptureScope.begin()
    let commandBuffer = commandQueue.makeCommandBuffer()!
    // Do Metal work


    // Pass the pixel buffer to TensorFlow Lite to perform inference.
    result = modelDataHandler?.runModel(onFrame: pixelBuffer)


    // 3.
    // encode your kernel
    commandBuffer.commit()
    myCaptureScope.end()

    timestamp = NSDate().timeIntervalSince1970
    let end = TimeInterval(timestamp)
    //var end = NSDate(timeIntervalSince1970: TimeInterval(myTimeInterval))

    total_latency += (end - start)
    total_count += 1;
    let rfps = total_count/(end - sstart)
    let fps = total_count/(end - start)
    let stri = "Time: " + String(end - start) + " avg: " + String(total_latency/total_count)+" count: " + String(total_count)+" rfps: "+String(rfps)+" fps: "+String(fps)
    print(stri)


    // display results by handing off to the InferenceViewController.
    dispatchQueue.main.async {
      guard let finalInferences = self.result?.inferences else {
        self.resultLabel.text = ""
        return
      }
     let resultStrings = finalInferences.map({ (inference) in
        return String(format: "%@ %.2f",inference.label,inference.confidence)
      })
      self.resultLabel.text = resultStrings.joined(separator: "\n")
    }

  }

2020-08-22 07:09:39.783215-0400 ImageClassification [3039:645963] coreml_version必须为2或3。设置为3。 2020-08-22 07:09:39.785103-0400 ImageClassification [3039:645963]为Metal创建了TensorFlow Lite委托。 2020-08-22 07:09:39.785505-0400 ImageClassification [3039:645963]启用金属GPU帧捕获 2020-08-22 07:09:39.786110-0400 ImageClassification [3039:645963]启用了金属API验证 2020-08-22 07:09:39.927854-0400 ImageClassification [3039:645963]初始化TensorFlow Lite运行时。 2020-08-22 07:09:39.928928-0400 ImageClassification [3039:645963] CoreML委托:在31个节点中委托了29个节点,带有2个分区

解决方法

感谢您试用Core ML委托。您可以共享使用的TFLite版本以及用于初始化Core ML委托的代码吗?另外,您可以确认您要运行的是浮动模型,而不是量化模型吗?

延迟时间可能会有所不同,具体取决于您所测量的内容,但是当仅测量推理时间时,我的iPhone 11 Pro对于CPU来说显示11ms,对于Core ML委托显示5.5ms。

事件探查器无法捕获神经引擎的利用率,但是如果发现延迟和较高的CPU利用率,则可能表明您的模型仅在CPU上运行。您也可以尝试time profiler找出哪个部分消耗最多的资源。