矢量化正则化梯度下降未通过数值检查

问题描述

我已经使用 NumPy 的向量化正则化梯度下降在 Python 中编写了一个用于逻辑回归的实现。我使用了数字检查方法来检查我的实现是否正确。数值检查验证了我对线性回归 GD 的实现，但 Logisitc 失败了，我找不到。任何帮助，将不胜感激。所以这里是：

这些是我计算成本和梯度的方法（更新函数计算梯度并更新参数）：

@staticmethod
def _hypothesis(parameters,features):
    return Activation.sigmoid(features.dot(parameters))

@staticmethod
def _cost_function(parameters,features,targets):
    m = features.shape[0]
    return np.sum(-targets * (np.log(LogisticRegression._hypothesis(parameters,features)) - (1 - targets) * (
        np.log(1 - LogisticRegression._hypothesis(parameters,features))))) / m

@staticmethod
def _update_function(parameters,targets,extra_param):
    regularization_vector = extra_param.get("regularization_vector",0)
    alpha = extra_param.get("alpha",0.001)
    m = features.shape[0]

    return parameters - alpha / m * (
        features.T.dot(LogisticRegression._hypothesis(parameters,features) - targets)) + \
           (regularization_vector / m) * parameters

成本函数不包含正则化，但我做的测试是正则化向量为零，所以没关系。我的测试方式：

def numerical_check(features,parameters,cost_function,update_function,extra_param,delta):
gradients = - update_function(parameters,extra_param)

parameters_minus = np.copy(parameters)
parameters_plus = np.copy(parameters)
parameters_minus[0,0] = parameters_minus[0,0] + delta
parameters_plus[0,0] = parameters_plus[0,0] - delta

approximate_gradient = - (cost_function(parameters_plus,targets) -
                          cost_function(parameters_minus,targets)) / (2 * delta) / parameters.shape[0]

return abs(gradients[0,0] - approximate_gradient) <= delta

基本上，当我将第一个参数 delta 向左和向右移动时，我正在手动计算梯度。然后我将它与我从更新函数中得到的梯度进行比较。我使用的初始参数等于 0，因此接收到的更新参数等于梯度除以特征数。 alpha 也等于 1。不幸的是，我从这两种方法中得到了不同的值，我不知道为什么。任何有关如何解决此问题的建议将不胜感激。

解决方法

您的成本函数存在错误。错误是由于括号分配无效。我已经解决了

def _cost_function(parameters,features,targets):
    m = features.shape[0]
    
    return -np.sum(
        (    targets) * (np.log(    LogisticRegression._hypothesis(parameters,features)))
      + (1 - targets) * (np.log(1 - LogisticRegression._hypothesis(parameters,features)))
    ) / m

尝试干净地编写代码，这有助于检测此类错误

我想我在您的代码中发现了一个可能的错误，请告诉我这是不是真的。

在您的 numerical_check 函数中，您正在调用 update_function 来初始化 gradient。但是，在上面的 _update_function 中，您实际上并没有返回渐变，而是返回了 parameters 的更新值。

也就是说，注意您的 _update_function 的 return 语句是这样的：

return parameters - alpha / m * (
    features.T.dot(LogisticRegression._hypothesis(parameters,features) - targets)) + \
       (regularization_vector / m) * parameters

我想给你的建议以及我在我的 ML 算法中所做的是创建一个单独的函数来计算梯度，例如

def _gradient(features,parameters,target):
    m = features.shape[0]
    return features.T.dot(LogisticRegression._hypothesis(parameters,features) - targets)) / m

然后更改您的 numerical_check 函数以初始化 gradient 如下：

gradient = _gradient(features,target)

希望这能解决您的问题。

gradient-descent logistic-regression machine-learning numpy