机器学习 - 从 Octave 代码在 Python 中实现梯度下降

问题描述

我正在尝试从头开始在 Python 中实现一个梯度下降函数，我已经在 GNU Octave 中实现并工作了该函数。不幸的是我被卡住了。我摆弄了一段时间并检查了 NumPy 文档，但到目前为止还没有运气。我知道诸如 scikit-learn 之类的库，但是我的目的是学习从头开始编写这样的函数。也许我的做法是错误的。您将在下面找到重现错误所需的所有代码。预先感谢您的帮助。

实际结果：测试失败并出现错误 ->“ValueError：matmul：输入操作数 0 没有足够的维度（有 0，gufunc 核心带有签名 (n?,k),(k,m?)->(n?,m?) 需要 1)"

预期结果：和值为 [5.2148,-0.5733] 的数组

Octave 中的函数 gradientDescent()：

function [theta,J_history] = gradientDescent(X,y,theta,alpha,num_iters)
        m = length(y); % number of training examples
        J_history = zeros(num_iters,1);

        for iter = 1:num_iters
          theta = theta - (alpha/m)*X'*(X*theta-y);
          J_history(iter) = computeCost(X,theta);
        end

python 中的函数gradient_descent()：

from numpy import zeros

def compute_cost(X,theta):
    m = len(y)
    ans = (X.T @ theta).T - y
    J = (ans @ ans.T) / (2 * m)
    return J[0,0]


def gradient_descent(X,num_iters):
    m = len(y)
    J_history = zeros((num_iters,1),dtype=int)
    for iter in range(num_iters):
        theta = theta - (alpha / m) @ X.T @ (X @ theta - y)
        J_history[iter] = compute_cost(X,theta)
    return theta

测试文件：test_ml_utils.py

import unittest
import numpy as np
from ml.ml_utils import compute_cost,gradient_descent

class TestGradientDescent(unittest.TestCase):
    # Todo: implement tests for Gradient Descent function
    # [theta J_hist] = gradientDescent([1 5; 1 2; 1 4; 1 5],[1 6 4 2]',[0 0]',0.01,1000);
    def test_gradient_descent_00(self):
        X = np.array([[1,5],[1,2],4],5]])
        y = np.array([1,6,4,2])
        theta = np.zeros(2)
        alpha = 0.01
        num_iter = 1000
        r_theta = np.array([5.2148,-0.5733])
        result = gradient_descent(X,num_iter)
        self.assertEqual((round(result,4),r_theta),'Result is wrong!')


if __name__ == '__main__':
    unittest.main()

解决方法

Python 中的 __matmul__ 运算符 @ 比 - 绑定得更紧密。这意味着您正在尝试使用操作数 (alpha / m)（标量）和 X.T（实际上是矩阵）进行矩阵乘法。见operator precedence。

在 Octave 代码中，(alpha - m) * X' 进行标量乘法，而不是矩阵，因此如果您希望在 Python 中具有相同的行为，请使用 * 而不是 @。这似乎是因为如果一个操作数是标量，Octave 会重载 * 运算符以执行标量乘法，但如果两个操作数都是矩阵，则执行矩阵乘法。

添加到 Adam 的答案（对于您遇到的错误是正确的）。

然而，我想更一般地补充一点，如果没有某种提示（无论是以编程方式还是以注释的形式）不同变量采用的维度，这段代码对读者来说毫无意义。

例如，代码中提示 y 可能是二维的，而您正在使用 len 获取其大小。作为这可能如何无声地失败的示例，请考虑：

>>> y = numpy.array([[1,2,3,4,5]])
>>> len( y )
1

而大概你想要

>>> numpy.shape( y )
(1,5)

或

>>> numpy.size( y )
5

我在您的单元测试中注意到您传递的是 1 级向量而不是 2 级向量，因此结果 y 是 1D 而不是 2D，但由于广播而使用 2D 的 X 进行操作。因此，尽管隐含的逻辑，您的代码仍然可以工作，但是如果没有明确说明这些事情，这是一个等待发生的运行时错误。

gradient-descent machine-learning numpy octave octave python