地球上最小/最简单数据的梯度下降问题 矢量化版本循环版本

问题描述

我想在这个简单的数据上实现梯度下降算法,但我遇到了问题。如果有人指出我正确的方向,那就太好了。 x=6 的答案应该是 7,但我没有达到。

X = [1,2,3,4]
Y = [2,4,5]
m_gradient = 0
b_gradient = 0
m,b = 0,0
learning_rate = 0.1

N = len(Y)
for p in range(100):
    for idx in range(len(Y)):
        x = X[idx]
        y = Y[idx]
        hyp = (m * x) + b
        m_gradient += -(2/N) * x * (y - hyp)
        b_gradient += -(2/N) * (y - hyp)
    m = m - (m_gradient * learning_rate)
    b = b - (b_gradient * learning_rate)
print(b+m*6)

解决方法

除了第一次迭代之外,您计算的所有梯度都不正确。您需要在外部 for 循环中将两个渐变设置为 0。

X = [1,2,3,4]
Y = [2,4,5]
m_gradient = 0
b_gradient = 0
m,b = 0,0
learning_rate = 0.1

N = len(Y)
for p in range(100):
    for idx in range(len(Y)):
        x = X[idx]
        y = Y[idx]
        hyp = (m * x) + b
        m_gradient += -(2/N) * x * (y - hyp)
        b_gradient += -(2/N) * (y - hyp)
    m = m - (m_gradient * learning_rate)
    b = b - (b_gradient * learning_rate)
    m_gradient,b_gradient = 0,0

print(b+m*6)

例如考虑b_gradient。在第一次迭代之前 b_gradient = 0 和计算为 0 + -0.5*(y0 - (m*x0 +b)) + -0.5(y1 - (m*x1 +b)) + -0.5(y2 - (m*x2 + b)) + -0.5(y3 - (m*x3 + b)),其中 x0 和 y0 分别为 X[0]Y[0]

第一次迭代后,b_gradient 的值为 -7,这是正确的。

问题从第二次迭代开始。您没有将 b_gradient 计算为 0 (-0.5(yn - (m*xn + b)) 的总和,而是将其计算为 b_gradient 的先前值加上 (-0.5(yn - (m*xn + b)) 的总和 0

第二次迭代后b_gradient的值为-2.6,这是不正确的。正确的值是 4.4,注意 4.4 - 7 = -2.6

,

您似乎想要使用梯度下降的线性回归系数。更多的数据点、稍小的学习率、通过查看损失来训练更多的 epoch 将有助于减少错误。

随着输入大小变大,下面的代码会给出稍微偏离的结果。上面提到的方法,比如训练更多的epoch,对于更大范围的数字会给出正确的结果。

矢量化版本

import numpy as np

X = np.array([1,5,6,7])
Y = np.array([2,7,8])
w_gradient = 0
b_gradient = 0
w,b = 0.5,0.5

learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)


for i in range(EPOCHS):

    # Predict
    Y_pred = (w * X) + b

    # Loss
    loss = np.square(Y_pred - Y).sum() / (2.0 * N)
    if i % 100 == 0:
        print(loss)

    # Backprop
    grad_y_pred = (2 / N) * (Y_pred - Y)
    w_gradient = (grad_y_pred * X).sum()
    b_gradient = (grad_y_pred).sum()

    # Optimize
    w -= (w_gradient * learning_rate)
    b -= (b_gradient * learning_rate)

print("\n\n")
print("LEARNED:")
print(w,b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))

# Expected: 30001,but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))

输出

LEARNED:
1.0000349103409163 0.9998271260509328

TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0

循环版本

import numpy as np

X = np.array([1,0.5

learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)


for i in range(EPOCHS):

    w_gradient = 0
    b_gradient = 0
    loss = 0

    for j in range(N):

        # Predict
        Y_pred = (w * X[j]) + b

        # Loss
        loss += np.square(Y_pred - Y[j]) / (2.0 * N)

        # Backprop
        grad_y_pred = (2 / N) * (Y_pred - Y[j])
        w_gradient += (grad_y_pred * X[j])
        b_gradient += (grad_y_pred)

    # Optimize
    w -= (w_gradient * learning_rate)
    b -= (b_gradient * learning_rate)

    # Print loss
    if i % 100 == 0:
        print(loss)


print("\n\n")
print("LEARNED:")
print(w,but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))

输出

LEARNED:
1.0000349103409163 0.9998271260509328

TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0