我无法弄清楚maxpool层的输出

问题描述

我试图从CS231n了解三层CNN的示例代码。但是有一个变量，我无法理解它的含义。在下面的代码中，它是变量D

class ThreeLayerConvNet(object):
    """
    A three-layer convolutional network with the following architecture:
    conv - relu - 2x2 max pool - affine - relu - affine - softmax
    The network operates on minibatches of data that have shape (N,C,H,W)
    consisting of N images,each with height H and width W and with C input
    channels.
    """

    def __init__(
        self,input_dim=(3,32,32),num_filters=32,filter_size=7,hidden_dim=100,num_classes=10,weight_scale=1e-3,reg=0.0,dtype=np.float32,):
        """
        Initialize a new network.
        Inputs:
        - input_dim: Tuple (C,W) giving size of input data
        - num_filters: Number of filters to use in the convolutional layer
        - filter_size: Width/height of filters to use in the convolutional layer
        - hidden_dim: Number of units to use in the fully-connected hidden layer
        - num_classes: Number of scores to produce from the final affine layer.
        - weight_scale: Scalar giving standard deviation for random initialization
          of weights.
        - reg: Scalar giving L2 regularization strength
        - dtype: numpy datatype to use for computation.
        """
        self.params = {}
        self.reg = reg
        self.dtype = dtype
            
        C,W = input_dim
        filter_height = filter_size # For convolution
        filter_widht = filter_size  # For convolution
        D = num_filters * (H // 2) * (W // 2) # This line
        self.params['W1'] = np.random.normal(scale=weight_scale,size=(num_filters,filter_height,filter_widht))
        self.params['b1'] = np.zeros((num_filters,))
        self.params['W2'] = np.random.normal(scale=weight_scale,size=(D,hidden_dim))
        self.params['b2'] = np.zeros((hidden_dim,))
        self.params['W3'] = np.random.normal(scale=weight_scale,size=(hidden_dim,num_classes))
        self.params['b3'] = np.zeros((num_classes,))

因为它被用作第二层的形状，所以我认为这是最大池层中输出要素的数量。但是，我认为输出功能的数量应该是

output_nb = num_filters * filter_weight * filter_height // 4

因为我们在最大池之前使用了卷积，所以输出的数量应该已经小于原始像素数H*W。然后，最大池层会从每个4个特征中选择最大值，因此应将其除以4。

D在这里代表什么？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

cnn deep-learning max-pooling python