问题描述
我最近一直在实现基于 OpenPose 的模型。在 OpenPose 中,它使用 VGG 作为其主干模型来提取特征图,但 VGG 包含最大池化层,这会将输出的形状减少到 1/4。这是 OpenPose 的模型结构:
VGGOpenPose(
(model0): OpenPose_Feature(
(model): Sequential(
(0): Conv2d(3,64,kernel_size=(3,3),stride=(1,1),padding=(1,1))
(1): ReLU(inplace=True)
(2): Conv2d(64,1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2,stride=2,padding=0,dilation=1,ceil_mode=False)
(5): Conv2d(64,128,1))
(6): ReLU(inplace=True)
(7): Conv2d(128,1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2,ceil_mode=False)
(10): Conv2d(128,256,1))
(11): ReLU(inplace=True)
(12): Conv2d(256,1))
(13): ReLU(inplace=True)
(14): Conv2d(256,1))
(15): ReLU(inplace=True)
(16): Conv2d(256,1))
(17): ReLU(inplace=True)
(18): MaxPool2d(kernel_size=2,ceil_mode=False)
(19): Conv2d(256,512,1))
(20): ReLU(inplace=True)
(21): Conv2d(512,1))
(22): ReLU(inplace=True)
(23): Conv2d(512,1))
(24): ReLU(inplace=True)
(25): Conv2d(256,1))
(26): ReLU(inplace=True)
)
)
(model1_1): Sequential(
(0): Conv2d(128,1))
(1): ReLU(inplace=True)
(2): Conv2d(128,1))
(3): ReLU(inplace=True)
(4): Conv2d(128,1))
(5): ReLU(inplace=True)
(6): Conv2d(128,kernel_size=(1,1))
(7): ReLU(inplace=True)
(8): Conv2d(512,38,1))
)
(model2_1): Sequential(
(0): Conv2d(185,kernel_size=(7,7),padding=(3,3))
(1): ReLU(inplace=True)
(2): Conv2d(128,3))
(3): ReLU(inplace=True)
(4): Conv2d(128,3))
(5): ReLU(inplace=True)
(6): Conv2d(128,3))
(7): ReLU(inplace=True)
(8): Conv2d(128,3))
(9): ReLU(inplace=True)
(10): Conv2d(128,1))
(11): ReLU(inplace=True)
(12): Conv2d(128,1))
)
(model3_1): Sequential(
(0): Conv2d(185,1))
)
(model4_1): Sequential(
(0): Conv2d(185,1))
)
(model5_1): Sequential(
(0): Conv2d(185,1))
)
(model6_1): Sequential(
(0): Conv2d(185,1))
)
(model1_2): Sequential(
(0): Conv2d(128,19,1))
)
(model2_2): Sequential(
(0): Conv2d(185,1))
)
(model3_2): Sequential(
(0): Conv2d(185,1))
)
(model4_2): Sequential(
(0): Conv2d(185,1))
)
(model5_2): Sequential(
(0): Conv2d(185,1))
)
(model6_2): Sequential(
(0): Conv2d(185,1))
)
)
在原始论文中,它说groundtruth heatmap和paf与输入图像的宽度和高度相同。 OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
我在 Python 中搜索了一些 OpenPose 的实现。他们大多使用element-wise loss函数来计算output和groundtruth label之间的loss,和论文中提到的函数一样:
我想知道 OpenPose 的输出是否与输入图像的大小不同,OpenPose 是如何计算输出和 groundtruth heatmap/paf 之间的损失函数的?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)