在 Octave 中尝试实现多个特征的线性回归失败

问题描述

我已将机器学习课程第 2 周编程作业（多功能部分）实施到 kaggle 的另一个数据库“美国金县的房屋销售”。我还修改了数据库并减少了功能数量。

所以我的问题是当我尝试计算价格来检查解决方案时，代码给了我一个非常不同的价格，计算出的 thetas 与预期的不同。

我还没有完成课程（现在从第 6 周开始），我无法提出解决方案。但是我想尝试自己的实现，但是我没有在课程讨论论坛中分享这个，因为此代码包含编程作业的答案。

总而言之，您的宝贵建议将不胜感激。

这些是我的代码：

clear ; close all; clc

fprintf('Loading data ...\n');

%% Load Data
data = load('multidata.txt');
X = data(:,1:8);
y = data(:,9);
m = length(y);

fprintf('Program paused. Press enter to continue.\n');
pause;

% Print out some data points. x ve y değerlerini görüyoruz. 
fprintf('First 10 examples from the dataset: \n');
fprintf(' x = [%.0f %.0f %.0f %.0f %.0f %.0f %.0f %.0f],y = %.0f \n',[X(1:10,:) y(1:10,:)]');

fprintf('Program paused. Press enter to continue.\n');
pause;

% Scale features and set them to zero mean. 
fprintf('normalizing Features ...\n');

%Featurenormalization için tanımlanan fonksiyonu çalıştırıyor. 
[X mu sigma] = featurenormalize(X);

%normalizasyon sonrası dataları görüyoruz. 
fprintf('First 10 examples from the dataset: \n');
fprintf(' x = [%.0f %.0f %.0f %.0f %.0f %.0f %.0f %.0f],:)]');

fprintf('Program paused. Press enter to continue.\n');
pause;

% Add intercept term to X (1 değerinde x0 feature'ı ekleniyor)
X = [ones(m,1) X];
%normalizasyon sonrası eklemenin ilk 10 numune için görüntüsü
fprintf('First 10 examples from the dataset: \n');
fprintf(' x = [%.0f %.0f %.0f %.0f %.0f %.0f %.0f %.0f %.0f] \n',:)]');

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ================ Part 2: Gradient Descent ================
fprintf('Running gradient descent ...\n');

% Choose some alpha value
alpha = 0.001; %alpha 0.001,0.003,0.01,0.03,0.1,0.3,1 
num_iters = 5000;

% Init Theta and Run Gradient Descent 
theta = zeros(9,1);
[theta,J_history] = gradientDescentMulti(X,y,theta,alpha,num_iters);

% Plot the convergence graph
figure;
plot(1:numel(J_history),J_history,'-b','linewidth',2);
xlabel('Number of iterations');
ylabel('Cost J');

% display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n',theta);
fprintf('\n');

% Estimate the price of a house
price =  [1,3,2.5,3400,16603,10,0]*theta; 
% ============================================================

fprintf(['Predicted price of a house ' ...
     '(using gradient descent):\n $%f\n'],price);

fprintf('Program paused. Press enter to continue.\n');
pause;

此代码用于特征归一化：

function [X_norm,mu,sigma] = featurenormalize(X)
X_norm = X;
mu = zeros(1,size(X,8));
sigma = zeros(1,8));

mu = mean(X);
sigma  = std(X);
X_norm = (X-mu)./sigma;

end

此代码用于梯度下降

function [theta,num_iters)

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters,1);

for iter = 1:num_iters

error = (X * theta) - y;
theta = theta - ((alpha/m) * (X' * error));

% Save the cost J in every iteration    
J_history(iter) = computeCostMulti(X,theta);

end

end

此代码用于成本计算：

function J = computeCostMulti(X,theta)

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

J =(1/(2*m))*(sum(((X*theta)-y).^2));
end

这是程序的输出：

Loading data ...
Program paused. Press enter to continue.
First 10 examples from the dataset:
x = [3 1 1180 5650 3 7 1180 0],y = 221900
x = [3 2 2570 7242 3 7 2170 400],y = 538000
x = [2 1 770 10000 3 6 770 0],y = 180000
x = [4 3 1960 5000 5 7 1050 910],y = 604000
x = [3 2 1680 8080 3 8 1680 0],y = 510000
x = [4 4 5420 101930 3 11 3890 1530],y = 1225000
x = [3 2 1715 6819 3 7 1715 0],y = 257500
x = [3 2 1060 9711 3 7 1060 0],y = 291850
x = [3 1 1780 7470 3 7 1050 730],y = 229500
x = [3 2 1890 6560 3 7 1890 0],y = 323000
Program paused. Press enter to continue.
normalizing Features ...
First 10 examples from the dataset:
x = [-0 -1 -1 -0 -1 -1 -1 -1],y = 221900
x = [-0 0 1 -0 -1 -1 0 0],y = 538000
x = [-1 -1 -1 -0 -1 -1 -1 -1],y = 180000
x = [1 1 -0 -0 2 -1 -1 1],y = 604000
x = [-0 -0 -0 -0 -1 0 -0 -1],y = 510000
x = [1 3 4 2 -1 3 3 3],y = 1225000
x = [-0 0 -0 -0 -1 -1 -0 -1],y = 257500
x = [-0 -1 -1 -0 -1 -1 -1 -1],y = 291850
x = [-0 -1 -0 -0 -1 -1 -1 1],y = 229500
x = [-0 1 -0 -0 -1 -1 0 -1],y = 323000
Program paused. Press enter to continue.
First 10 examples from the dataset:
x = [1 -0 -1 -1 -0 -1 -1 -1 -1]
x = [1 -0 0 1 -0 -1 -1 0 0]
x = [1 -1 -1 -1 -0 -1 -1 -1 -1]
x = [1 1 1 -0 -0 2 -1 -1 1]
x = [1 -0 -0 -0 -0 -1 0 -0 -1]
x = [1 1 3 4 2 -1 3 3 3]
x = [1 -0 0 -0 -0 -1 -1 -0 -1]
x = [1 -0 -1 -1 -0 -1 -1 -1 -1]
x = [1 -0 -1 -0 -0 -1 -1 -1 1]
x = [1 -0 1 -0 -0 -1 -1 0 -1]
Program paused. Press enter to continue.
Running gradient descent ...
Theta computed from gradient descent:
536458.148898
-39768.741950
1127.113170
96860.285036
-9795.554225
37968.073474
122038.218945
73282.058139
63890.263077

Predicted price of a house (using gradient descent):
$417602636.107368
Program paused. Press enter to continue.
>>

解决方法

You're missing the big summation here 和乘法 X' * 错误实际上应该是元素明智的： error .* X.

所以简而言之，你的 theta 应该是这样的 theta = theta - ((alpha/m) * sum((error .* X)));

如果出现任何问题，请检查矩阵的大小，但基本上这是我能看到的最直接的错误，希望我有所帮助并重新检查公式的解释，一旦你得到它，你甚至不需要照顾它！

gradient-descent linear-regression machine-learning octave octave