如何在 Android 应用程序中正确实现 MediaPipe Side Packets?您将如何使用 Iris Mediapipe 解决方案推断虹膜方向?

问题描述

我希望有人可以帮助我提供一些想法或指导我使用 Mediapipe 使用 Iris .aar 创建自定义 Android 应用程序的一些进一步阅读材料,我已经倾注了官方 MediaPipe 文档,但找到了它有点受限制,现在我正在努力取得进展。我一直在尝试为 Iris 模型添加预期的 Side Packet 并尝试实时提取特定的地标坐标。

我的目标是创建一个开源注视方向驱动的文本到语音键盘,用于辅助功能,使用修改后的 MediaPipe Iris 解决方案来推断用户的注视方向以控制应用程序,我将非常感谢您对此的任何帮助.

这是我目前的发展计划和进展:

  1. 从命令行设置 Mediapipe 并构建示例完成
  2. 生成用于面部检测和虹膜跟踪的 .aars 完成
  3. 设置 Android Studio 以构建 Mediapipe 应用完成
  4. 使用 .aar 构建和测试人脸检测示例应用完成
  5. 修改人脸检测示例以使用 Iris .aar IN PROGRESS
  6. 输出虹膜和眼睛边缘之间的坐标以及它们之间的距离,以实时估计方向。或者修改图表和计算器以在可能的情况下为我推断并重建 .aar
  7. 将注视方向集成到应用中的控制方案中。
  8. 在实施初始控制后扩展应用功能

到目前为止,我已经使用以下构建文件生成了 Iris .aar, 我构建的 .aar 是否包含子图和主图的计算器,还是我需要在我的 AAR 构建文件添加其他内容

.aar 构建文件

load("//mediapipe/java/com/google/mediapipe:mediapipe_aar.bzl","mediapipe_aar")
mediapipe_aar(
name = "mp_iris_tracking_aar",calculators = ["//mediapipe/graphs/iris_tracking :iris_tracking_gpu_deps"],)

目前我有一个 android studio 项目,其中包含以下资产和前面提到的 Iris .aar。

Android Studio Assets:
iris_tracking_gpu.binarypb
face_landmark.tflite
iris_landmark.tflite
face_detection_front.tflite

现在,我只是尝试按原样构建它,以便我更好地了解该过程并可以验证我的构建环境设置是否正确。我已经成功构建并测试了文档中列出的人脸检测示例,这些示例可以正确运行,但是在修改项目以利用 iris .aar 时,它可以正确构建但在运行时崩溃,但例外情况是:需要侧包“focal_length_pixel”但未提供。

我尝试根据媒体管道代表中的 Iris 示例将焦距代码添加到 onCreate,但我不知道如何修改它以使用 Iris .aar,是否还有其他文档我可以通过阅读为我指明正确的方向吗?

我需要将此代码段(我认为)集成到人脸检测示例的修改代码中,但不确定如何集成。谢谢你的帮助:)

    float focalLength = cameraHelper.getFocalLengthPixels();
    if (focalLength != Float.MIN_VALUE) {
    Packet focalLengthSidePacket = processor.getPacketCreator().createFloat32(focalLength);
    Map<String,Packet> inputSidePackets = new HashMap<>();
    inputSidePackets.put(FOCAL_LENGTH_STREAM_NAME,focalLengthSidePacket);
    processor.setInputSidePackets(inputSidePackets);
    }
    haveAddedSidePackets = true;
Modified Face Tracking AAR example:
package com.example.iristracking;

// copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License,Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,software
// distributed under the License is distributed on an "AS IS" BASIS,// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

import android.graphics.SurfaceTexture;
import android.os.Bundle;
import android.util.Log;
import java.util.HashMap;
import java.util.Map;
import androidx.appcompat.app.AppCompatActivity;
import android.util.Size;
import android.view.SurfaceHolder;
import android.view.SurfaceView;
import android.view.View;
import android.view.ViewGroup;
import com.google.mediapipe.components.CameraHelper;
import com.google.mediapipe.components.CameraXPreviewHelper;
import com.google.mediapipe.components.ExternalTextureConverter;
import com.google.mediapipe.components.FrameProcessor;
import com.google.mediapipe.components.PermissionHelper;
import com.google.mediapipe.framework.AndroidAssetUtil;
import com.google.mediapipe.framework.Packet;
import com.google.mediapipe.glutil.EglManager;

/** Main activity of MediaPipe example apps. */
public class MainActivity extends AppCompatActivity {
private static final String TAG = "MainActivity";
private boolean haveAddedSidePackets = false;

private static final String FOCAL_LENGTH_STREAM_NAME = "focal_length_pixel";
private static final String OUTPUT_LANDMARKS_STREAM_NAME = "face_landmarks_with_iris";

private static final String BINARY_GRAPH_NAME = "iris_tracking_gpu.binarypb";
private static final String INPUT_VIDEO_STREAM_NAME = "input_video";
private static final String OUTPUT_VIDEO_STREAM_NAME = "output_video";
private static final CameraHelper.CameraFacing CAMERA_FACING = CameraHelper.CameraFacing.FRONT;

// Flips the camera-preview frames vertically before sending them into FrameProcessor to be
// processed in a MediaPipe graph,and flips the processed frames back when they are displayed.
// This is needed because OpenGL represents images assuming the image origin is at the bottom-left
// corner,whereas MediaPipe in general assumes the image origin is at top-left.
private static final boolean FLIP_FRAMES_VERTICALLY = true;

static {
    // Load all native libraries needed by the app.
    System.loadLibrary("mediapipe_jni");
    System.loadLibrary("opencv_java3");
}

// {@link SurfaceTexture} where the camera-preview frames can be accessed.
private SurfaceTexture previewFrameTexture;
// {@link SurfaceView} that displays the camera-preview frames processed by a MediaPipe graph.
private SurfaceView previewdisplayView;

// Creates and manages an {@link EGLContext}.
private EglManager eglManager;
// Sends camera-preview frames into a MediaPipe graph for processing,and displays the processed
// frames onto a {@link Surface}.
private FrameProcessor processor;
// Converts the GL_TEXTURE_EXTERNAL_OES texture from Android camera into a regular texture to be
// consumed by {@link FrameProcessor} and the underlying MediaPipe graph.
private ExternalTextureConverter converter;

// Handles camera access via the {@link CameraX} Jetpack support library.
private CameraXPreviewHelper cameraHelper;


@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    previewdisplayView = new SurfaceView(this);
    setupPreviewdisplayView();

    // Initialize asset manager so that MediaPipe native libraries can access the app assets,e.g.,// binary graphs.
    AndroidAssetUtil.initializeNativeAssetManager(this);

    eglManager = new EglManager(null);
    processor =
            new FrameProcessor(
                    this,eglManager.getNativeContext(),BINARY_GRAPH_NAME,INPUT_VIDEO_STREAM_NAME,OUTPUT_VIDEO_STREAM_NAME);
    processor.getVideoSurfaceOutput().setFlipY(FLIP_FRAMES_VERTICALLY);

    PermissionHelper.checkAndRequestCameraPermissions(this);


}

@Override
protected void onResume() {
    super.onResume();
    converter = new ExternalTextureConverter(eglManager.getContext());
    converter.setFlipY(FLIP_FRAMES_VERTICALLY);
    converter.setConsumer(processor);
    if (PermissionHelper.cameraPermissionsGranted(this)) {
        startCamera();
    }
}

@Override
protected void onPause() {
    super.onPause();
    converter.close();
}

@Override
public void onRequestPermissionsResult(
        int requestCode,String[] permissions,int[] grantResults) {
    super.onRequestPermissionsResult(requestCode,permissions,grantResults);
    PermissionHelper.onRequestPermissionsResult(requestCode,grantResults);
}

private void setupPreviewdisplayView() {
    previewdisplayView.setVisibility(View.GONE);
    ViewGroup viewGroup = findViewById(R.id.preview_display_layout);
    viewGroup.addView(previewdisplayView);

    previewdisplayView
            .getHolder()
            .addCallback(
                    new SurfaceHolder.Callback() {
                        @Override
                        public void surfaceCreated(SurfaceHolder holder) {
                            processor.getVideoSurfaceOutput().setSurface(holder.getSurface());
                        }

                        @Override
                        public void surfaceChanged(SurfaceHolder holder,int format,int width,int height) {
                            // (Re-)Compute the ideal size of the camera-preview display (the area that the
                            // camera-preview frames get rendered onto,potentially with scaling and rotation)
                            // based on the size of the SurfaceView that contains the display.
                            Size viewSize = new Size(width,height);
                            Size displaySize = cameraHelper.computedisplaySizefromViewSize(viewSize);

                            // Connect the converter to the camera-preview frames as its input (via
                            // previewFrameTexture),and configure the output width and height as the computed
                            // display size.
                            converter.setSurfaceTextureAndAttachToGLContext(
                                    previewFrameTexture,displaySize.getWidth(),displaySize.getHeight());
                        }

                        @Override
                        public void surfaceDestroyed(SurfaceHolder holder) {
                            processor.getVideoSurfaceOutput().setSurface(null);
                        }
                    });
}

private void startCamera() {
    cameraHelper = new CameraXPreviewHelper();
    cameraHelper.setonCameraStartedListener(
            surfaceTexture -> {
                previewFrameTexture = surfaceTexture;
                // Make the display view visible to start showing the preview. This triggers the
                // SurfaceHolder.Callback added to (the holder of) previewdisplayView.
                previewdisplayView.setVisibility(View.VISIBLE);
            });
    cameraHelper.startCamera(this,CAMERA_FACING,/*surfaceTexture=*/ null);

}
}

解决方法

override fun onResume() {
        super.onResume()
        converter = ExternalTextureConverter(eglManager?.context,NUM_BUFFERS)

        if (PermissionHelper.cameraPermissionsGranted(this)) {
            var rotation: Int = 0
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.R) {
                rotation = this.display!!.rotation
            } else {
                rotation = this.windowManager.defaultDisplay.rotation
            }

            converter!!.setRotation(rotation)
            converter!!.setFlipY(FLIP_FRAMES_VERTICALLY)

            startCamera(rotation)

            if (!haveAddedSidePackets) {
                val packetCreator = mediapipeFrameProcessor!!.getPacketCreator();
                val inputSidePackets = mutableMapOf<String,Packet>()

                focalLength = cameraHelper?.focalLengthPixels!!
                Log.i(TAG_MAIN,"OnStarted focalLength: ${cameraHelper?.focalLengthPixels!!}")
                inputSidePackets.put(
                    FOCAL_LENGTH_STREAM_NAME,packetCreator.createFloat32(focalLength.width.toFloat())
                )
                mediapipeFrameProcessor!!.setInputSidePackets(inputSidePackets)
                haveAddedSidePackets = true

                val imageSize = cameraHelper!!.imageSize
                val calibrateMatrix = Matrix()
                calibrateMatrix.setValues(
                    floatArrayOf(
                        focalLength.width * 1.0f,0.0f,imageSize.width / 2.0f,focalLength.height * 1.0f,imageSize.height / 2.0f,1.0f
                    )
                )
                val isInvert = calibrateMatrix.invert(matrixPixels2World)
                if (!isInvert) {
                    matrixPixels2World = Matrix()
                }
            }
            converter!!.setConsumer(mediapipeFrameProcessor)
        }
    }`