问题描述
我正在训练一个小批量梯度下降模型,以便与大约 0.00016 的直接解 rmse 收敛。并且有效数据集(函数中的 rmse_valid_array )的 RMSE 输出在第一个时期很好,但是在几个时期之后,它开始爆炸,我为此苦苦挣扎了好几天,算法似乎很好,其中可能是问题吗?
附言 X_train的形状为(11000,41),y_train的形状为(11000,1),这里的batch size为1,学习率为0.001。 我将权重初始化为非常小(除以 1000)。我检查过 X_mini 和 y_mini 是正常的,并且在几个 epoch 后 graident 开始爆炸。
当我将梯度计算从 1/len(y)
(每个批次的大小)更改为 1/m
(整个训练集的大小)时,每个 epoch 的 rmse 确实变小,但在正如 Andrew Ng 在他的小批量讲座中提到的趋势。
[0.003352938483114684,0.014898628026733278,0.015708125817549583,0.15904084037991562,0.9772361042313762,17.776216375980052,187.04333942512542,978.648663972064,17383.631549616875,103997.59758713894,2222088.2561604036,23334640.70860544,118182306.23839562,2606049599.35717,18920677325.736164,261342486636.4693,1738434547629.957,10577420781634.316,164217272049684.75,1131726496072944.8,1.6219370161174172e+16,2.4623815536311107e+17,...
这是进行小批量处理的主要功能
def mini_batch_GD(X_train,X_valid,y_train,y_valid,batch_size,lr,CT):
m = len(y_train)
n = X_train.shape[1]
# initialize weight
w = (np.random.random(n)).reshape(1,-1)/1000
rmse_train_array = []
rmse_valid_array = []
time_epoch = []
for epoch in range(0,100):
start_time = time.time()
# shuffle batches
mini_batches = create_minibatches(X_train,batch_size)
for mini_batch in mini_batches:
X_mini,y_mini = mini_batch
y_pred = np.dot(X_mini,w.T).reshape(-1,1)
# t = np.array(y_mini).reshape(-1,1)
gradient = (1/len(y_pred) * np.dot(X_mini.T,y_pred - y_mini)).reshape(1,-1)
w = w - lr * gradient
# training rmse
y_pred_train = np.dot(X_train,1)
rmse_train_array.append(rmse(y_pred_train,y_train))
# valid rmse
y_pred_valid = np.dot(X_valid,1)
rmse_valid_array.append(rmse(y_pred_valid,y_valid))
# time for each epoch
time_epoch.append(time.time() - start_time)
# check for convergence
if rmse(y_pred_valid,y_valid) <= CT:
break
return w,rmse_train_array,rmse_valid_array,time_epoch
辅助函数创建小批量和rmse如下
def create_minibatches(X,y,batch_size):
data = np.hstack((X,y))
np.random.shuffle(data)
n_minibatches = data.shape[0]
i = 0
batch_size = 2
mini_batches = []
for i in range(n_minibatches // batch_size):
mini_batch = data[i * batch_size:(i + 1) * batch_size,:]
X_mini = mini_batch[:,:-1]
y_mini = mini_batch[:,-1].reshape((-1,1))
mini_batches.append((X_mini,y_mini))
if data.shape[0] % batch_size != 0:
mini_batch = data[i * batch_size : data.shape[0]]
X_mini = mini_batch[:,y_mini))
return mini_batches
def rmse(yPred,y):
return np.sqrt(mean_squared_error(yPred,y))
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)