如何将cv :: Mat正确地转换为具有完美匹配值的Torch

问题描述

我正在尝试在C ++中的jit跟踪模型上进行推理，当前在Python中获得的输出与在C ++中获得的输出不同。

最初，我认为这是由jit模型本身引起的，但是现在我不这么认为，因为我在C ++代码中发现了输入张量中的一些小偏差。我相信我已按照文档中的说明进行了所有操作，因此在torch::from_blob中可能还会出现问题。我不确定！

因此，为了确定是哪种情况，以下是Python和C ++中的代码片段以及用于对其进行测试的示例输入。

这是示例图片：

对于Pytorch，请运行以下代码段：

import cv2
import torch
from PIL import Image 
import math
import numpy as np

img = Image.open('D:/Codes/imgs/profile6.jpg')
width,height = img.size
scale = 0.6
sw,sh = math.ceil(width * scale),math.ceil(height * scale)
img = img.resize((sw,sh),Image.BILINEAR)
img = np.asarray(img,'float32')

# preprocess it 
img = img.transpose((2,1))
img = np.expand_dims(img,0)
img = (img - 127.5) * 0.0078125
img = torch.from_numpy(img)

对于C ++：

#include <iostream>
#include <torch/torch.h>
#include <torch/script.h>
using namespace torch::indexing;

#include <opencv2/core.hpp>
#include<opencv2/imgproc/imgproc.hpp>
#include<opencv2/highgui/highgui.hpp>

void test15()
{
    std::string pnet_path = "D:/Codes//MTCNN/pnet.jit"; 
    cv::Mat img = cv::imread("D:/Codes/imgs/profile6.jpg");
    int width = img.cols;
    int height = img.rows;
    float scale = 0.6f;
    int sw = int(std::ceil(width * scale));
    int sh = int(std::ceil(height * scale));

    //cv::Mat img;
    cv::resize(img,img,cv::Size(sw,1);

    auto tensor_image = torch::from_blob(img.data,{ img.rows,img.cols,img.channels() },at::kByte);
    tensor_image = tensor_image.permute({ 2,1 });
    tensor_image.unsqueeze_(0);
    tensor_image = tensor_image.toType(c10::kFloat).sub(127.5).mul(0.0078125);
    tensor_image.to(c10::DeviceType::cpu);
}

### Input comparison : 
and here are the tensor values both in Python and C++ 
Pytorch input (`img[:,:,:10,:10]`):

```python
img: tensor([[
    [[0.3555,0.3555,0.3477,0.3711,0.3945,0.3867,0.3789,0.3789],[ 0.3477,0.3398,0.3398],[ 0.3320,0.3242,0.3320,0.3164,0.3242],[ 0.2852,0.2930,0.2852,0.2773,0.2773],[ 0.2539,0.2617,0.2539,0.2148,0.2070,0.2070],[ 0.1914,0.1914,0.1836,0.1758,0.1523,0.1367,0.1211,0.0977,0.0898],[ 0.1367,0.0820,0.0742,0.0586,0.0273,-0.0195,-0.0742,-0.0820],[-0.0039,-0.0273,-0.0508,-0.0664,-0.0898,-0.1211,-0.1367,-0.1523,-0.1758,-0.1758],[-0.2070,-0.2070,-0.2148,-0.2227,-0.1992,-0.1836,-0.1680,-0.1680],[-0.2539,-0.2461,-0.2383,-0.2305,-0.1914,-0.1602]],[[0.8398,0.8398,0.8320,0.8242,0.8477,0.8164,0.8164],[ 0.8320,0.8086,0.8008,0.7930,0.7852,0.7695,0.7695],[ 0.7852,0.7773,0.7617,0.7539,0.7383,0.7305,0.7148],[ 0.7227,0.7070,0.6992,0.6914,0.6836,0.6680,0.6523,0.6367],[ 0.6289,0.6211,0.6055,0.5586,0.5508,0.5352,0.5273,0.5039],[ 0.4805,0.4727,0.4648,0.4570,0.4180,0.3633,0.3164],[ 0.3555,0.3086,0.2695,0.2461,0.1055,0.0820],0.1133,0.0508,-0.0117,-0.0352,-0.0820,-0.0898],[-0.1211,-0.1289,-0.1445,-0.1602,-0.1289],-0.1445]],[[0.9492,0.9414,0.9336,0.9180,0.9258,0.9023,0.8867,0.9023],[ 0.9258,0.9102,0.8945,0.8789,0.8633,0.8398],[ 0.8711,0.8555,0.7773],0.7461,0.7148,0.6836],[ 0.6758,0.6602,0.6367,0.5820,0.5742,0.5430,0.5273],[ 0.5117,0.5117,0.4961,0.4883,0.4336,0.4102,[ 0.3867,[ 0.1680,0.1445,0.0352,-0.0039,-0.0586,[-0.0898,-0.0977,-0.1445],[-0.1758,-0.1523]]]])

C ++ / Libtorch张量值（img.index({Slice(),Slice(),Slice(None,10),10)});）：

img: (1,1,.,.) =
  0.3555  0.3555  0.3555  0.3555  0.3555  0.4023  0.3945  0.3867  0.3789  0.3789
  0.3633  0.3633  0.3555  0.3555  0.3555  0.3555  0.3477  0.3555  0.3398  0.3398
  0.3398  0.3320  0.3320  0.3242  0.3398  0.3320  0.3398  0.3242  0.3242  0.3242
  0.2930  0.2930  0.2852  0.2773  0.2852  0.2930  0.2852  0.2852  0.2773  0.2852
  0.2695  0.2695  0.2617  0.2773  0.2695  0.2227  0.2227  0.2227  0.2148  0.2148
  0.1914  0.1914  0.1914  0.1914  0.1914  0.1602  0.1445  0.1289  0.1055  0.0977
  0.1289  0.1133  0.0820  0.0742  0.0586  0.0586  0.0195 -0.0273 -0.0820 -0.0898
  0.0039 -0.0195 -0.0508 -0.0664 -0.0820 -0.1289 -0.1445 -0.1602 -0.1836 -0.1836
 -0.2070 -0.2148 -0.2227 -0.2383 -0.2305 -0.2070 -0.2070 -0.1914 -0.1836 -0.1758
 -0.2539 -0.2461 -0.2461 -0.2383 -0.2305 -0.1914 -0.1914 -0.1758 -0.1680 -0.1602

(1,2,.) =
  0.8398  0.8398  0.8242  0.8164  0.8242  0.8555  0.8398  0.8320  0.8242  0.8242
  0.8320  0.8320  0.8242  0.8242  0.8086  0.8008  0.7930  0.7773  0.7695  0.7617
  0.7930  0.7852  0.7773  0.7695  0.7695  0.7695  0.7539  0.7461  0.7305  0.7227
  0.7070  0.7070  0.6992  0.6992  0.6914  0.6836  0.6758  0.6602  0.6523  0.6367
  0.6367  0.6367  0.6289  0.6289  0.6211  0.5664  0.5586  0.5430  0.5352  0.5117
  0.4805  0.4805  0.4805  0.4648  0.4727  0.4258  0.4023  0.3711  0.3555  0.3320
  0.3398  0.3320  0.3008  0.2773  0.2617  0.2461  0.1992  0.1445  0.0898  0.0586
  0.1367  0.1211  0.0898  0.0508  0.0273 -0.0195 -0.0352 -0.0664 -0.0898 -0.1055
 -0.1211 -0.1289 -0.1367 -0.1602 -0.1602 -0.1523 -0.1523 -0.1445 -0.1445 -0.1367
 -0.2148 -0.2070 -0.2070 -0.2070 -0.1992 -0.1680 -0.1680 -0.1602 -0.1523 -0.1445

(1,3,.) =
  0.9414  0.9414  0.9336  0.9180  0.9102  0.9336  0.9258  0.9023  0.8945  0.9023
  0.9180  0.9180  0.9102  0.9102  0.8945  0.8711  0.8633  0.8555  0.8242  0.8477
  0.8711  0.8711  0.8633  0.8477  0.8320  0.8164  0.8164  0.7930  0.7852  0.7852
  0.7773  0.7773  0.7539  0.7461  0.7305  0.7148  0.7070  0.6992  0.6836  0.6758
  0.6836  0.6836  0.6758  0.6680  0.6445  0.5898  0.5820  0.5586  0.5508  0.5352
  0.5273  0.5195  0.5117  0.4883  0.4883  0.4414  0.4102  0.3789  0.3633  0.3398
  0.3867  0.3633  0.3320  0.3008  0.2695  0.2539  0.2070  0.1445  0.0898  0.0664
  0.1836  0.1523  0.1133  0.0742  0.0352 -0.0117 -0.0352 -0.0664 -0.0898 -0.1055
 -0.0820 -0.0977 -0.1211 -0.1367 -0.1445 -0.1445 -0.1445 -0.1367 -0.1445 -0.1445
 -0.1758 -0.1758 -0.1758 -0.1758 -0.1758 -0.1602 -0.1523 -0.1680 -0.1602 -0.1602

[ cpuFloatType{1,10,10} ]

顺便说一下，这些是标准化/预处理之前的张量值：

Python：

img.shape: (3,101,180)
img: [
 [[173. 173. 172. 173. 175.]
  [172. 173. 173. 173. 173.]
  [170. 169. 170. 169. 170.]
  [164. 165. 164. 164. 165.]
  [160. 161. 160. 161. 160.]]

 [[235. 235. 234. 233. 234.]
  [234. 233. 232. 232. 231.]
  [228. 228. 227. 226. 226.]
  [220. 218. 218. 217. 216.]
  [208. 207. 207. 207. 205.]]

 [[249. 248. 247. 245. 245.]
  [246. 246. 244. 243. 242.]
  [239. 238. 237. 236. 234.]
  [228. 227. 225. 224. 223.]
  [214. 213. 212. 212. 209.]]]

CPP：

img.shape: [1,180]
img: (1,.) =
  173  173  173  173  173
  174  174  173  173  173
  171  170  170  169  171
  165  165  164  163  164
  162  162  161  163  162

(1,.) =
  235  235  233  232  233
  234  234  233  233  231
  229  228  227  226  226
  218  218  217  217  216
  209  209  208  208  207

(1,.) =
  248  248  247  245  244
  245  245  244  244  242
  239  239  238  236  234
  227  227  224  223  221
  215  215  214  213  210
[ cpuByteType{1,5,5} ]

乍一看，您可能会发现它们看起来完全相同，但是当您靠近时，您会看到输入中的许多小偏差！如何避免这些更改，并获得C ++中的确切值？

我想知道是什么导致这种奇怪的现象发生！

解决方法

很明显，这确实是一个输入问题，并且更具体地说，这是因为图像首先由PIL.Image.open在Python中读取，然后更改为numpy数组。如果使用OpenCV读取图像，那么在Python和C ++中，所有输入方式都相同。

更新

这又是与输入有关的。我们之所以有细微的差异，是因为该模型是在rgb图像上进行训练的，因此频道顺序很重要。当使用PIL图像时，对于不同的方法会发生一些来回转换，因此，整个过程变得一团糟，您之前已经在上面阅读过。

总而言之，从cv::Mat到torch::Tensor的转换没有任何问题，反之亦然，问题在于创建图像并将其馈送到网络的方式在Python和C ++中有所不同。当Python和C ++后端都使用OpenCV处理图像时，它们的输出和结果匹配100％。

如何将cv :: Mat正确地转换为具有完美匹配值的Torch :: Tensor？

问题描述

解决方法

更多说明

更新