GridsearchCV 损失不等于 model.fit() 损失值

问题描述

我很困惑哪个度量gridsearchcv在其参数搜索使用。我的理解是,我的模型对象喂它一个指标,这就是被用来确定“best_params”。但是,这似乎并不如此。我认为,得分=无是认的,如在model.compile的度量选项给定的第一度量()的结果而使用。所以在我的情况下,打分函数使用应该是mean_squred_error。接下来,说明我对这个问题的解释。

下面是我在做什么。我10万周的观察使用sklearn 10个功能模拟一些回归的数据。我玩弄keras因为我通常使用pytorch在过去从来没有真正与keras涉足到现在。我从我的gridsearchcv通话VS的model.fit()调用注意到的损失函数输出的差异后,我有我的参数设置优化。现在我知道我可以改装= True,并且不会再改装模式,但我想感受一下的keras和sklearn gridsearchcv功能输出

要更明确一些这里的差异是我所看到的。我模拟使用sklearn一些数据如下:

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X,y = make_regression(n_samples=N,n_features=feats,n_@R_976_4045@ive=2,noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

我已经创建了一个“create_model”功能,正在寻求调其中我使用的激活函数(同样,这是用于概念证明一个简单的例子)。

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30,input_dim=feats,activation=activation_fn,kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10,activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1,activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',optimizer='adam',metrics=['mean_squared_error','mae'])
    return model

执行网格搜索我得到以下输出

model = KerasRegressor(build_fn=create_model,epochs=50,batch_size=200,verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = gridsearchcv(estimator=model,param_grid=param_grid,n_jobs=1,cv=3)
grid_result = grid.fit(X_train,y_train,verbose=1)
print("Best: %f using %s" % (grid_result.best_score_,grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}

好的,所以最好的指标是 21.16 的均方误差(我知道他们翻转符号以创建最大化问题)。所以,当我使用activation_fn拟合模型=“线性”的MSE我得到完全不同。

best_model = create_model('linear')
history = best_model.fit(X_train,verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]: 
[10053.778002929688,9826.66806640625,......
  ......
 344.16363830566405,326.45237121582034]

区别在于326.45对比21.16。任何见解,以什么我误解将不胜感激。如果他们互相给予一个合理邻域内是错误从一折VS整个训练数据集,我会更舒服。但是 21 远不及 326。谢谢!

在整个代码在这里看到。

import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense,Dropout,Activation,Flatten
from keras.layers import Convolution2D,MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import gridsearchcv
from keras.wrappers.scikit_learn import KerasClassifier,KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing 
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X,noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30,'mae'])
    return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model,verbose=0)

# define the grid search parameters
activations = ['linear',verbose=1)

best_model = create_model('linear')
history = best_model.fit(X_train,verbose=1)

history.history.keys()
plt.plot(history.history['mean_absolute_error'])

# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_,grid_result.best_params_))

解决方法

输出中报告的大损失 (326.45237121582034) 是训练损失。如果您需要将指标与 grid_result.best_score_(在 GridSearchCV 中)和 MSE(在 best_model.fit 中)进行比较,您必须请求验证损失(参见下面的代码)。

现在问题来了:为什么验证损失低于训练损失?在您的情况下,这主要是因为 dropout(在训练期间应用,但不在验证/测试期间应用) - 这就是为什么当您删除 dropout 时,训练和验证损失之间的差异消失的原因。您可以找到有关验证损失较低的可能原因的详细说明here

简而言之,您的模型的性能 (MSE) 由 grid_result.best_score_(在您的示例中为 21.163454)给出。

import numpy as np
from keras import Sequential
from keras.layers import Dense,Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.datasets import make_regression
import tensorflow as tf

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
tf.random.set_seed(42)

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X,y = make_regression(n_samples=N,n_features=feats,n_informative=2,noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30,input_dim=feats,activation=activation_fn,kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10,activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1,activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',optimizer='adam',metrics=['mean_squared_error','mae'])
    return model

# create model
model = KerasRegressor(build_fn=create_model,epochs=50,batch_size=200,verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model,param_grid=param_grid,n_jobs=1,cv=3)
grid_result = grid.fit(X_train,y_train,verbose=1,validation_data=(X_test,y_test))

best_model = create_model('linear')
history = best_model.fit(X_train,y_test))

history.history.keys()
# plt.plot(history.history['mae'])

# summarize results
print(grid_result.cv_results_)
print("Best: %f using %s" % (grid_result.best_score_,grid_result.best_params_))