问题描述
我正在尝试在 MNIST 数字数据集上实现 softmax 回归。我正在使用批量 GD,因此成本应该会逐渐下降。这是我得到的结果
cost after epoch 1 : [2.63035001]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
This is separate from the ipykernel package so we can avoid doing imports until
cost after epoch 2 : [29.10684701]
cost after epoch 3 : [12.43702583]
cost after epoch 4 : [2.302654]
cost after epoch 5 : [2.30265079]
cost after epoch 6 : [2.30264759]
cost after epoch 7 : [2.3026444]
cost after epoch 8 : [2.30264121]
cost after epoch 9 : [2.30263803]
cost after epoch 10 : [2.30263485]
如果我只将学习率更改为 0.01
,我会得到以下结果:
cost after epoch 1 : [2.63039004]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
This is separate from the ipykernel package so we can avoid doing imports until
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: RuntimeWarning: divide by zero encountered in log
# This is added back by InteractiveShellApp.init_path()
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in multiply
# This is added back by InteractiveShellApp.init_path()
cost after epoch 2 : [nan]
cost after epoch 3 : [115.76138438]
cost after epoch 4 : [2.30384942]
cost after epoch 5 : [2.30379418]
cost after epoch 6 : [2.30374]
cost after epoch 7 : [2.30368689]
cost after epoch 8 : [2.30363481]
cost after epoch 9 : [2.30358374]
cost after epoch 10 : [2.30353368]
我怀疑这是因为梯度爆炸。我为我的激活函数尝试了 np.clip()
,但没有帮助。
def sigmoid(matrix):
s = np.clip( matrix,-500,500 )
s = 1 / (1 + np.exp(-matrix))
return s
def relu(matrix):
matrix = np.clip(matrix,500)
matrix = matrix * (matrix > 0)
return matrix
我使用 layer_dims = [784,512,256,128,64,10]
初始化,其中 784 用于 28x28 像素图片。
def he_init(layer_dims):
parameters = {}
L = len(layer_dims)
for i in range(1,L):
parameters['w' + str(i)] = np.random.randn(layer_dims[i],layer_dims[i - 1]) * np.sqrt(2 / layer_dims[i - 1])
parameters['b' + str(i)] = zero_init(layer_dims[i])
return parameters
我使用 np.exp()
和 np.log()
函数的唯一地方是 sigmoid、softmax 函数和成本。
def softmax(z):
softmax_matrix = np.exp(z) / np.sum(np.exp(z),axis = 0,keepdims=True)
return softmax_matrix
def softmax_cost(aL,y): # aL: The last layer,y: labels
loss = np.sum(-y * np.log(aL),axis=0,keepdims=True)
cost = (1 / y.shape[1]) * np.sum(loss,axis=1)
return cost
我仍然认为我有梯度爆炸问题,但我不知道我能做些什么来解决它。我在 kaggle notebook here
上有完整记录的代码版本,其中包含输入维度和变量定义任何修复我的模型的建议将不胜感激。谢谢!
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)