问题描述
我有一个用多个 Layernormalization
层训练的模型,我不确定在激活 dropout 进行预测时简单的权重转移是否正常工作。这是我正在使用的代码:
from tensorflow.keras.models import load_model,Model
from tensorflow.keras.layers import Dense,Dropout,Layernormalization,Input
model0 = load_model(path + 'model0.h5')
OW = model0.get_weights()
inp = Input(shape=(10,))
D1 = Dense(760,activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=True)
N1 = Layernormalization()(DO1)
D2 = Dense(460,activation='softsign')(N1)
DO2 = Dropout(0.16)(D2,training=True)
N2 = Layernormalization()(DO2)
D3 = Dense(664,activation='softsign')(N2)
DO3 = Dropout(0.09)(D3,training=True)
N3 = Layernormalization()(DO3)
out = Dense(1,activation='linear')(N3)
mP = Model(inp,out)
mP.set_weights(OW)
mP.compile(loss='mse',optimizer='Adam')
mP.save(path + 'new_model.h5')
如果我在 dropout 层上设置 training=False
,模型会做出与原始模型相同的预测。然而,当代码如上编写时,平均预测并不接近原始/确定性预测。
我之前使用 dropout 集训练而开发的模型具有与确定性模型几乎相同的平均概率预测。是不是我做错了什么,或者这是使用 Layernormalization 和主动 dropout 的问题?据我所知,Layernormalization 有可训练的参数,所以我不知道主动 dropout 是否会干扰它。如果是这样,我不知道如何补救。
inputs = np.zeros(shape=(1,10),dtype='float32')
inputsP = np.zeros(shape=(1000,dtype='float32')
opD = mD.predict(inputs)[0,0]
opP = mP.predict(inputsP).reshape(1000)
print('Deterministic: %.4f Probabilistic: %.4f' % (opD,np.mean(opP)))
plt.scatter(0,opD,color='black',label='Det',zorder=3)
plt.scatter(0,np.mean(opP),color='red',label='Mean prob',zorder=2)
plt.errorbar(0,yerr=np.std(opP),zorder=2,markersize=0,capsize=20,label=r'$\sigma$ bounds')
plt.grid(axis='y',zorder=0)
plt.legend()
plt.tick_params(axis='x',labelsize=0,labelcolor='white',color='white',width=0,length=0)
结果输出和绘图如下所示。
Deterministic: -0.9732 Probabilistic: -0.9011
解决方法
编辑我的答案:
我认为问题只是模型的采样不足。预测的标准偏差与辍学率直接相关,因此逼近确定性模型所需的预测数量也会增加。如果您对下面的代码进行荒谬的测试,但将每个 dropout 层的 dropout 设置为 0.7,则 100,000 个样本不再足以将确定性均值逼近 10^-3 以内,并且预测的标准偏差会变得更大。
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense,Dropout,Input
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
GPUs = tf.config.experimental.list_physical_devices('GPU')
for gpu in GPUs:
tf.config.experimental.set_memory_growth(gpu,True)
inp = Input(shape=(10,))
D1 = Dense(760,activation='softplus')(inp)
D2 = Dense(460,activation='softsign')(D1)
D3 = Dense(664,activation='softsign')(D2)
out = Dense(1,activation='linear')(D3)
mP = Model(inp,out)
mP.compile(loss='mse',optimizer='Adam')
inp = Input(shape=(10,activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=False)
D2 = Dense(460,activation='softsign')(DO1)
DO2 = Dropout(0.16)(D2,training=True)
D3 = Dense(664,activation='softsign')(DO2)
DO3 = Dropout(0.09)(D3,training=True)
out = Dense(1,activation='linear')(DO3)
mP2 = Model(inp,out)
mP2.set_weights(mP.get_weights())
mP2.compile(loss='mse',optimizer='Adam')
data = np.zeros(shape=(100000,10),dtype='float32')
res = mP.predict(data).reshape(data.shape[0])
res2 = mP2.predict(data).reshape(data.shape[0])
print (np.abs(res[0] - res2.mean()))