问题描述
我想在这个简单的数据上实现梯度下降算法,但我遇到了问题。如果有人指出我正确的方向,那就太好了。 x=6 的答案应该是 7,但我没有达到。
X = [1,2,3,4]
Y = [2,4,5]
m_gradient = 0
b_gradient = 0
m,b = 0,0
learning_rate = 0.1
N = len(Y)
for p in range(100):
for idx in range(len(Y)):
x = X[idx]
y = Y[idx]
hyp = (m * x) + b
m_gradient += -(2/N) * x * (y - hyp)
b_gradient += -(2/N) * (y - hyp)
m = m - (m_gradient * learning_rate)
b = b - (b_gradient * learning_rate)
print(b+m*6)
解决方法
除了第一次迭代之外,您计算的所有梯度都不正确。您需要在外部 for
循环中将两个渐变设置为 0。
X = [1,2,3,4]
Y = [2,4,5]
m_gradient = 0
b_gradient = 0
m,b = 0,0
learning_rate = 0.1
N = len(Y)
for p in range(100):
for idx in range(len(Y)):
x = X[idx]
y = Y[idx]
hyp = (m * x) + b
m_gradient += -(2/N) * x * (y - hyp)
b_gradient += -(2/N) * (y - hyp)
m = m - (m_gradient * learning_rate)
b = b - (b_gradient * learning_rate)
m_gradient,b_gradient = 0,0
print(b+m*6)
例如考虑b_gradient
。在第一次迭代之前 b_gradient = 0
和计算为 0 + -0.5*(y0 - (m*x0 +b)) + -0.5(y1 - (m*x1 +b)) + -0.5(y2 - (m*x2 + b)) + -0.5(y3 - (m*x3 + b))
,其中 x0 和 y0 分别为 X[0]
和 Y[0]
。
第一次迭代后,b_gradient
的值为 -7,这是正确的。
问题从第二次迭代开始。您没有将 b_gradient
计算为 0 (-0.5(yn - (m*xn + b)) 的总和,而是将其计算为 b_gradient
的先前值加上 (-0.5(yn - (m*xn + b))
的总和 0
第二次迭代后b_gradient
的值为-2.6,这是不正确的。正确的值是 4.4,注意 4.4 - 7 = -2.6
。
您似乎想要使用梯度下降的线性回归系数。更多的数据点、稍小的学习率、通过查看损失来训练更多的 epoch 将有助于减少错误。
随着输入大小变大,下面的代码会给出稍微偏离的结果。上面提到的方法,比如训练更多的epoch,对于更大范围的数字会给出正确的结果。
矢量化版本
import numpy as np
X = np.array([1,5,6,7])
Y = np.array([2,7,8])
w_gradient = 0
b_gradient = 0
w,b = 0.5,0.5
learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)
for i in range(EPOCHS):
# Predict
Y_pred = (w * X) + b
# Loss
loss = np.square(Y_pred - Y).sum() / (2.0 * N)
if i % 100 == 0:
print(loss)
# Backprop
grad_y_pred = (2 / N) * (Y_pred - Y)
w_gradient = (grad_y_pred * X).sum()
b_gradient = (grad_y_pred).sum()
# Optimize
w -= (w_gradient * learning_rate)
b -= (b_gradient * learning_rate)
print("\n\n")
print("LEARNED:")
print(w,b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))
# Expected: 30001,but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))
输出
LEARNED:
1.0000349103409163 0.9998271260509328
TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0
循环版本
import numpy as np
X = np.array([1,0.5
learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)
for i in range(EPOCHS):
w_gradient = 0
b_gradient = 0
loss = 0
for j in range(N):
# Predict
Y_pred = (w * X[j]) + b
# Loss
loss += np.square(Y_pred - Y[j]) / (2.0 * N)
# Backprop
grad_y_pred = (2 / N) * (Y_pred - Y[j])
w_gradient += (grad_y_pred * X[j])
b_gradient += (grad_y_pred)
# Optimize
w -= (w_gradient * learning_rate)
b -= (b_gradient * learning_rate)
# Print loss
if i % 100 == 0:
print(loss)
print("\n\n")
print("LEARNED:")
print(w,but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))
输出
LEARNED:
1.0000349103409163 0.9998271260509328
TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0