为什么dropout前后的差值不等于dropout比例?

问题描述

我正在指定一个从 dropout 进行正则化的网络。但是我无法理解这里是如何处理辍学的。具体来说,为什么应用dropout前后的0个数之差不完全等于dropout比例?

class DropoutDenseNetwork(tf.Module):
    def __init__(self,name=None):
        super(DropoutDenseNetwork,self).__init__(name=name)
        self.dense_layer1 = Dense(32)
        self.dropout = tf.keras.layers.Dropout(0.2)
        self.dense_layer2 = Dense(10,activation=tf.identity)

    @tf.function
    def __call__(self,x,is_training):
        embed = self.dense_layer1(x)
        propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed,0.),tf.float32))
        embed = self.dropout(embed,is_training)
        propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed,tf.float32))
        tf.print('Zeros before and after:',propn_zero_before,"and",propn_zero_after)
        output = self.dense_layer2(embed)
        return output

if 'drop_dense_net' not in locals():
    drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1,100]),tf.constant(True))

解决方法

因为 rate 只是训练过程中任何神经元被丢弃的概率。它不会总是准确地落在 0.2 个数据上,尤其是只有 32 个单位。如果您增加单位数(例如 100,000),您会发现它更接近于 0.2 的 rate

import tensorflow as tf
from tensorflow.keras.layers import *

class DropoutDenseNetwork(tf.Module):
    def __init__(self,name=None):
        super(DropoutDenseNetwork,self).__init__(name=name)
        self.dense_layer1 = Dense(100000)
        self.dropout = tf.keras.layers.Dropout(0.2)
        self.dense_layer2 = Dense(1)

    @tf.function
    def __call__(self,x,is_training):
        embed = self.dense_layer1(x)
        propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed,0.),tf.float32))
        embed = self.dropout(embed,is_training)
        propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed,tf.float32))
        tf.print('Zeros before and after:',propn_zero_before,"and",propn_zero_after)

drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1,10]),tf.constant(True))
Zeros before and after: 0 and 0.19954

在幕后,tf.keras.layers.Dropout 使用 tf.nn.dropout。后面的文档说明:

rate:每个元素被丢弃的概率

source code 中,您可以看到它在与输入相同形状的随机值之间创建了一个掩码,并选择了高于 rate 的值。当然,并非恰好有 0.2 个值会高于 rate

 random_tensor = random_ops.random_uniform(
        noise_shape,seed=seed,dtype=x_dtype)

keep_mask = random_tensor >= rate