如何在ML.Net中对yolo v3或v4 onnx模型实施后处理

问题描述

我关注了this microsoft tutorial，没有问题。但我想将模型更改为yolo v3或v4。我从onnx/models获得了YOlov4 onnx模型，并能够获得yolov4 onnx模型的所有三个浮点输出数组，但是问题出在后期处理上，我无法从这些输出中获得适当的边界框。

我在Microsoft教程src代码中更改了所有内容，例如锚，步幅，输出网格大小，一些功能以及...，以便与yolov4兼容。但我无法得到适当的结果。我用python implementation检查了我所有的代码，但不知道问题出在哪里。是否有人链接或知道如何使用ML.Net在C＃中实现yolo v3或v4 onnx模型

任何帮助将不胜感激

解决方法

我认为不可能将Microsoft的教程从YOLO v2直接移植到v3，因为它依赖于每个模型的输入和输出。

作为旁注，我在this GitHub repo: 'YOLOv3MLNet'中将另一个YOLO v3模型移植到ML.Net。它包含一个功能齐全的ML.Net管道。

我还在此处提供了此答案的代码：

YOLO v3 with ML.Net
YOLO v4 with ML.Net

以您的模型为例，我以YOLO v3（在onnx / models仓库中可用）为例。可以在here中找到对该模型的很好解释。

第一个建议是使用Netron查看模型。这样，您将看到输入和输出层。他们还在onnx / models文档中描述了这些层。

Netron's yolov3-10 screenshot

（我在Netron中看到，这个特定的YOLO v3模型还通过执行非最大压缩步骤进行了一些后处理。）

输入图层名称：=================================== | song | Style | | artist_name | Taylor $wift | | album | 1989 | ===================================，input_1
输出层名称：image_shape，yolonms_layer_1/ExpandDims_1:0，yolonms_layer_1/ExpandDims_3:0

根据模型文档，输入形状为：

调整大小的图像（1x3x416x416）原始图像大小（1x2），即[image.size ['1]，image.size [0]]

我们首先需要定义ML.Net输入和输出类，如下所示：

yolonms_layer_1/concat_2:0

然后我们创建ML.Net管道并加载预测引擎：

public class YoloV3BitmapData
{
    [ColumnName("bitmap")]
    [ImageType(416,416)]
    public Bitmap Image { get; set; }

    [ColumnName("width")]
    public float ImageWidth => Image.Width;

    [ColumnName("height")]
    public float ImageHeight => Image.Height;
}

public class YoloV3Prediction
{
    /// <summary>
    /// ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10,647.
    /// </summary>
    public const int YoloV3BboxPredictionCount = 10_647;

    /// <summary>
    /// Boxes
    /// </summary>
    [ColumnName("yolonms_layer_1/ExpandDims_1:0")]
    public float[] Boxes { get; set; }

    /// <summary>
    /// Scores
    /// </summary>
    [ColumnName("yolonms_layer_1/ExpandDims_3:0")]
    public float[] Scores { get; set; }

    /// <summary>
    /// Concat
    /// </summary>
    [ColumnName("yolonms_layer_1/concat_2:0")]
    public int[] Concat { get; set; }
}

NB ：我们需要定义// Define scoring pipeline var pipeline = mlContext.Transforms.ResizeImages(inputColumnName: "bitmap",outputColumnName: "input_1",imageWidth: 416,imageHeight: 416,resizing: ResizingKind.IsoPad) .Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input_1",outputAsFloatArray: true,scaleImage: 1f / 255f)) .Append(mlContext.Transforms.Concatenate("image_shape","height","width")) .Append(mlContext.Transforms.ApplyOnnxModel(shapeDictionary: new Dictionary<string,int[]>() { { "input_1",new[] { 1,3,416,416 } } },inputColumnNames: new[] { "input_1","image_shape" },outputColumnNames: new[] { "yolonms_layer_1/ExpandDims_1:0","yolonms_layer_1/ExpandDims_3:0","yolonms_layer_1/concat_2:0" },modelFile: @"D:\yolov3-10.onnx")); // Fit on empty list to obtain input data schema var model = pipeline.Fit(mlContext.Data.LoadFromEnumerable(new List<YoloV3BitmapData>())); // Create prediction engine var predictionEngine = mlContext.Model.CreatePredictionEngine<YoloV3BitmapData,YoloV3Prediction>(model);参数，因为它们在模型中没有完全定义。

根据模型文档，输出形状为：

该模型有3个输出。框：（1x'n_candidates'x4），所有锚框的坐标，分数：（1x80x'n_candidates'），每个类别所有锚框的分数，索引：（'nbox'x3），从框张量中选择的索引。所选的索引格式为（batch_index，class_index，box_index）。

下面的功能将帮助您处理结果，我留给您对其进行微调。

shapeDictionary

在此版本的模型中，它们是80个类（有关链接，请参见模型的GitHub文档）。

您可以像这样使用上面的内容：

public IReadOnlyList<YoloV3Result> GetResults(YoloV3Prediction prediction,string[] categories)
{
    if (prediction.Concat == null || prediction.Concat.Length == 0)
    {
        return new List<YoloV3Result>();
    }

    if (prediction.Boxes.Length != YoloV3Prediction.YoloV3BboxPredictionCount * 4)
    {
        throw new ArgumentException();
    }

    if (prediction.Scores.Length != YoloV3Prediction.YoloV3BboxPredictionCount * categories.Length)
    {
        throw new ArgumentException();
    }

    List<YoloV3Result> results = new List<YoloV3Result>();

    // Concat size is 'nbox'x3 (batch_index,class_index,box_index)
    int resulstCount = prediction.Concat.Length / 3;
    for (int c = 0; c < resulstCount; c++)
    {
        var res = prediction.Concat.Skip(c * 3).Take(3).ToArray();

        var batch_index = res[0];
        var class_index = res[1];
        var box_index = res[2];

        var label = categories[class_index];
        var bbox = new float[]
        {
            prediction.Boxes[box_index * 4],prediction.Boxes[box_index * 4 + 1],prediction.Boxes[box_index * 4 + 2],prediction.Boxes[box_index * 4 + 3],};
        var score = prediction.Scores[box_index + class_index * YoloV3Prediction.YoloV3BboxPredictionCount];

        results.Add(new YoloV3Result(bbox,label,score));
    }

    return results;
}

您可以找到一个result example here。

ml.net onnx post-processing