问题描述
我主要按照此指南来估算单个图像的头部姿势: https://towardsdatascience.com/real-time-head-pose-estimation-in-python-e52db1bc606a
人脸检测效果很好-如果我绘制图像和检测到的地标,它们可以很好地对齐。
我正在从图像中估计相机矩阵,并且假设没有镜头失真:
size = image.shape
focal_length = size[1]
center = (size[1]/2,size[0]/2)
camera_matrix = np.array([[focal_length,center[0]],[0,focal_length,center[1]],1]],dtype="double")
dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion
我正在尝试通过使用solvePNP将图像中的点与3D模型中的点进行匹配来获得头部姿势:
# 3D-model points to which the points extracted from an image are matched:
model_points = np.array([
(0.0,0.0,0.0),# Nose tip
(0.0,-330.0,-65.0),# Chin
(-225.0,170.0,-135.0),# Left eye corner
(225.0,# Right eye corner
(-150.0,-150.0,-125.0),# Left Mouth corner
(150.0,-125.0) # Right mouth corner
])
image_points = np.array([
shape[30],# Nose tip
shape[8],# Chin
shape[36],# Left eye left corner
shape[45],# Right eye right corne
shape[48],# Left Mouth corner
shape[54] # Right mouth corner
],dtype="double")
success,rotation_vec,translation_vec) = \
cv2.solvePnP(model_points,image_points,camera_matrix,dist_coeffs)
最后,我从旋转得到欧拉角:
rotation_mat,_ = cv2.Rodrigues(rotation_vec)
pose_mat = cv2.hconcat((rotation_mat,translation_vec))
_,_,angles = cv2.decomposeProjectionMatrix(pose_mat)
现在方位是我期望的-如果我向左看,则为负,中间为零,向右为正。
但是海拔很奇怪-如果我在中间看,它的值是一个常数,但符号是随机的-图像与图像之间会发生变化(该值大约为170)
当我向上看时,符号为正,而我向上看时,值减小, 当我向下看时,符号为负,而向下看时,值减小。
有人可以向我解释这个输出吗?
解决方法
好吧,看来我已经找到了解决方案-模型点(我在有关该主题的几个博客中发现)似乎是错误的。该代码似乎适用于模型点和图像点的这种组合(不知道为什么要反复试验):
model_points = np.float32([[6.825897,6.760612,4.402142],[1.330353,7.122144,6.903745],[-1.330353,[-6.825897,[5.311432,5.485328,3.987654],[1.789930,5.393625,4.413414],[-1.789930,[-5.311432,[2.005628,1.409845,6.165652],[-2.005628,[2.774015,-2.080775,5.048531],[-2.774015,[0.000000,-3.116408,6.097667],-7.415691,4.070434]])
image_points = np.float32([shape[17],shape[21],shape[22],shape[26],shape[36],shape[39],shape[42],shape[45],shape[31],shape[35],shape[48],shape[54],shape[57],shape[8]])