问题描述
我正在指定一个从 dropout 进行正则化的网络。但是我无法理解这里是如何处理辍学的。具体来说,为什么应用dropout前后的0个数之差不完全等于dropout比例?
class DropoutDenseNetwork(tf.Module):
def __init__(self,name=None):
super(DropoutDenseNetwork,self).__init__(name=name)
self.dense_layer1 = Dense(32)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(10,activation=tf.identity)
@tf.function
def __call__(self,x,is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed,0.),tf.float32))
embed = self.dropout(embed,is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed,tf.float32))
tf.print('Zeros before and after:',propn_zero_before,"and",propn_zero_after)
output = self.dense_layer2(embed)
return output
if 'drop_dense_net' not in locals():
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1,100]),tf.constant(True))
解决方法
因为 rate
只是训练过程中任何神经元被丢弃的概率。它不会总是准确地落在 0.2 个数据上,尤其是只有 32 个单位。如果您增加单位数(例如 100,000),您会发现它更接近于 0.2 的 rate
:
import tensorflow as tf
from tensorflow.keras.layers import *
class DropoutDenseNetwork(tf.Module):
def __init__(self,name=None):
super(DropoutDenseNetwork,self).__init__(name=name)
self.dense_layer1 = Dense(100000)
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_layer2 = Dense(1)
@tf.function
def __call__(self,x,is_training):
embed = self.dense_layer1(x)
propn_zero_before = tf.reduce_mean(tf.cast(tf.equal(embed,0.),tf.float32))
embed = self.dropout(embed,is_training)
propn_zero_after = tf.reduce_mean(tf.cast(tf.equal(embed,tf.float32))
tf.print('Zeros before and after:',propn_zero_before,"and",propn_zero_after)
drop_dense_net = DropoutDenseNetwork()
drop_dense_net(tf.ones([1,10]),tf.constant(True))
Zeros before and after: 0 and 0.19954
在幕后,tf.keras.layers.Dropout
使用 tf.nn.dropout
。后面的文档说明:
rate:每个元素被丢弃的概率
在 source code 中,您可以看到它在与输入相同形状的随机值之间创建了一个掩码,并选择了高于 rate 的值。当然,并非恰好有 0.2 个值会高于 rate
:
random_tensor = random_ops.random_uniform(
noise_shape,seed=seed,dtype=x_dtype)
keep_mask = random_tensor >= rate