普通python中的简单线性回归

问题描述

我最近刚刚开始学习机器学习,学习了 Coursera 的 Andrew ng 的机器学习。我尝试在不使用任何机器学习库的情况下在纯 Python 中实现简单的线性回归。而这段代码却失败了。随着循环迭代并达到非常高的值,成本函数正在增加。我在这里做错了什么?

def cost_function(train_set,theta0,theta1):
  total_error = 0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    total_error += ((theta0 + theta1 * x) - y) ** 2
  return total_error / 2 * len(train_set)

def gradient_descent(train_set,learning_rate,theta1):
  theta0_der,theta1_der = 0,0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    theta0_der += ((theta0 + theta1 * x) - y)
    theta1_der += ((theta0 + theta1 * x)- y) * x
  new_theta0 = theta0 - (1/len(train_set) * learning_rate * theta0_der)
  new_theta1 = theta1 - (1/len(train_set) * learning_rate * theta1_der)
  return new_theta0,new_theta1

def main():
  theta0,theta1 = 0,0
  learning_rate = 0.001
  iterations = 100
  x_train = data_frame.iloc[:,0]
  y_train = data_frame.iloc[:,1]
  train_set = list(zip(x_train,y_train))[:280] # [(1,2.444),(2,3.555),(3,6.444) ..... ]
  print('Initial cost: ' + str(cost_function(train_set,theta1)))
  for i in range(iterations):
    x = train_set[i][0]
    y = train_set[i][1]
    new_theta0,new_theta1 = gradient_descent(train_set,theta1)
    theta0 = new_theta0
    theta1 = new_theta1
    print([theta0,theta1])
  print('Final cost: ' + str(cost_function(train_set,theta1)))

main()

解决方法

您将学习率设置得太高,请尝试将其更改为 0.0001


但是, 您可以使用其闭式方程直接实现简单线性回归,即:

enter image description here

用 python 实现这个非常简单,你可以这样做:-

class LinearRegression:
    def fit(self,X,y):
        ones = np.ones(len(X)).reshape(-1,1)
        X = np.concatenate((ones,X),axis=1)

        B = np.matmul(np.linalg.pinv(np.matmul(X.T,X)),np.matmul(X.T,y))

        self.slope = B[1:]
        self.intercept = B[0]

    def predict(self,X):
        self.predicted = np.dot(X,self.slope) + self.intercept
        return self.predicted

拟合函数正在讨论 Xy 值并计算 Beta(通过使用 NumPy 的上述公式)。 Beta 是一个矩阵,其中第一个索引值是截距,其余都是斜率!

预测函数采用二维数组,然后计算预测!