问题描述
我如何正确标准化/缩放以影响我的错误指标(RMSE 和 MAE)?或者,如果可能的话,我如何标准化计算出的 RMSE?我没有逆变换为实数。 或者有没有办法对计算出的 RMSE 进行归一化?
导入所需的库:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
将数据分解为 x/y:
x = data.iloc[:,1:]
y = data.iloc[:,:1]
调用训练和测试大小:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2,random_state = 0)
from sklearn.preprocessing import StandardScaler
缩放数据(训练集和测试集):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
缩放继续:
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
拟合standardScaler
:
sc_x = StandardScaler()
sc_y = StandardScaler()
x_train = sc_x.fit_transform(x_train)
y_train = sc_y.fit_transform(y_train)
拟合模型(支持向量回归):
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(x_train,y_train)
导入错误指标:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from math import sqrt
计算第一个错误 (RMSE)。这就是一切飙升的地方,我原以为在归一化数据范围内获得 RMSE 值,但结果却是实数 (rmse = 42596.17):
mse=sqrt(mean_squared_error(y_test,y_pred))
print(mse)
我也没有逆变换为实数。由于我的 RMSE 值不受缩放的影响,我决定使用以下代码对 RMSE 进行归一化:
d=pd.DataFrame()
d['y']=y_test
d['X']=y_pred
d=normalize(d)
y_pred = d['X']
y_test = d['y']
mse=sqrt(mean_squared_error(y_test,y_test))
print('Mean Squared Error: ',round(y_test,3))
在尝试标准化顽固的 RMSE 后,我得到以下错误:
ValueError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self,value)
3165 try:
-> 3166 value = Series(value)
3167 except (ValueError,NotImplementedError,TypeError) as err:
~\anaconda3\lib\site-packages\pandas\core\series.py in __init__(self,data,index,dtype,name,copy,fastpath)
230
--> 231 if is_empty_data(data) and dtype is None:
232 # gh-17261
~\anaconda3\lib\site-packages\pandas\core\construction.py in is_empty_data(data)
595 is_list_like_without_dtype = is_list_like(data) and not hasattr(data,"dtype")
--> 596 is_simple_empty = is_list_like_without_dtype and not data
597 return is_none or is_simple_empty
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty,a.bool(),a.item(),a.any() or a.all().
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-142-60389a99eb72> in <module>
1 d=pd.DataFrame()
----> 2 d['y']=y_test
3 d['X']=y_pred
4 d=normalize(d)
5 y_pred = d['X']
~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self,key,value)
3038 else:
3039 # set column
-> 3040 self._set_item(key,value)
3041
3042 def _setitem_slice(self,key: slice,value):
~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self,value)
3113 ensure homogeneity.
3114 """
-> 3115 self._ensure_valid_index(value)
3116 value = self._sanitize_column(key,value)
3117 NDFrame._set_item(self,value)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self,value)
3166 value = Series(value)
3167 except (ValueError,TypeError) as err:
-> 3168 raise ValueError(
3169 "Cannot set a frame with no defined index "
3170 "and a value that cannot be converted to a Series"
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
解决方法
我原以为在归一化数据范围内得到 RMSE 值,但我得到的是实数 (rmse = 42596.17):
mse=sqrt(mean_squared_error(y_test,y_pred))
print(mse)
那是因为您没有缩放 y_test
。
您确实扩展了整个 data
集,但前提是您已经从中拆分了训练集和测试集。