普通python中的简单线性回归

问题描述

我最近刚刚开始学习机器学习，学习了 Coursera 的 Andrew ng 的机器学习。我尝试在不使用任何机器学习库的情况下在纯 Python 中实现简单的线性回归。而这段代码却失败了。随着循环迭代并达到非常高的值，成本函数正在增加。我在这里做错了什么？

def cost_function(train_set,theta0,theta1):
  total_error = 0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    total_error += ((theta0 + theta1 * x) - y) ** 2
  return total_error / 2 * len(train_set)

def gradient_descent(train_set,learning_rate,theta1):
  theta0_der,theta1_der = 0,0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    theta0_der += ((theta0 + theta1 * x) - y)
    theta1_der += ((theta0 + theta1 * x)- y) * x
  new_theta0 = theta0 - (1/len(train_set) * learning_rate * theta0_der)
  new_theta1 = theta1 - (1/len(train_set) * learning_rate * theta1_der)
  return new_theta0,new_theta1

def main():
  theta0,theta1 = 0,0
  learning_rate = 0.001
  iterations = 100
  x_train = data_frame.iloc[:,0]
  y_train = data_frame.iloc[:,1]
  train_set = list(zip(x_train,y_train))[:280] # [(1,2.444),(2,3.555),(3,6.444) ..... ]
  print('Initial cost: ' + str(cost_function(train_set,theta1)))
  for i in range(iterations):
    x = train_set[i][0]
    y = train_set[i][1]
    new_theta0,new_theta1 = gradient_descent(train_set,theta1)
    theta0 = new_theta0
    theta1 = new_theta1
    print([theta0,theta1])
  print('Final cost: ' + str(cost_function(train_set,theta1)))

main()

解决方法

您将学习率设置得太高，请尝试将其更改为 0.0001。

但是，您可以使用其闭式方程直接实现简单线性回归，即：

用 python 实现这个非常简单，你可以这样做：-

class LinearRegression:
    def fit(self,X,y):
        ones = np.ones(len(X)).reshape(-1,1)
        X = np.concatenate((ones,X),axis=1)

        B = np.matmul(np.linalg.pinv(np.matmul(X.T,X)),np.matmul(X.T,y))

        self.slope = B[1:]
        self.intercept = B[0]

    def predict(self,X):
        self.predicted = np.dot(X,self.slope) + self.intercept
        return self.predicted

拟合函数正在讨论 X 和 y 值并计算 Beta（通过使用 NumPy 的上述公式）。 Beta 是一个矩阵，其中第一个索引值是截距，其余都是斜率！

预测函数采用二维数组，然后计算预测！

gradient-descent linear-regression machine-learning