使用Keras Tuner进行时间序列分割

问题描述

是否可以使用Keras调谐器通过Time Series Split来调整NN,类似于sklearn中的sklearn.model_selection.TimeSeriessplit。

例如,考虑来自https://towardsdatascience.com/hyperparameter-tuning-with-keras-tuner-283474fbfbe

的示例调谐器类
from kerastuner import HyperModel
class SampleModel(HyperModel):
    def __init__(self,input_shape):
        self.input_shape = input_shape
    def build(self,hp):
        model = Sequential()
        model.add(
            layers.Dense(
                units=hp.Int('units',8,64,4,default=8),activation=hp.Choice(
                    'dense_activation',values=['relu','tanh','sigmoid'],default='relu'),input_shape=input_shape
            )
        )
    
        model.add(layers.Dense(1))
        
        model.compile(
            optimizer='rmsprop',loss='mse',metrics=['mse']
        )
        
        return model

调谐器:

tuner_rs = RandomSearch(
            hypermodel,objective='mse',seed=42,max_trials=10,executions_per_trial=2)


tuner_rs.search(x_train_scaled,y_train,epochs=10,validation_split=0.2,verbose=0)

因此,代替validation_split = 0.2,在上面的行中可以执行以下操作

from sklearn.model_selection import TimeSeriessplit

#defining a time series split object
tscv = TimeSeriessplit(n_splits = 5)

#using that in Keras Tuner
tuner_rs.search(x_train,validation_split=tscv,verbose=0)

解决方法

我是这样解决的:

首先,我创建了一个允许执行阻塞时间序列拆分的类。我发现使用这个时间序列分割可能比 Sklearn TimeSeriesSplit 更好,因为我们不会在已经看到数据的实例上训练我们的模型。从图中可以看出,如果分割数为 5,BTSS 会将您的训练数据分成 5 部分,并且只有这些分割中共有的验证数据。 (由于 StackOverflow 不允许我上传图片,我将发布一个参考链接:https://hub.packtpub.com/cross-validation-strategies-for-time-series-forecasting-tutorial/

class BlockingTimeSeriesSplit():
  def __init__(self,n_splits):
      self.n_splits = n_splits

  def get_n_splits(self,X,y,groups):
      return self.n_splits

  def split(self,y=None,groups=None):
      n_samples = len(X)
      k_fold_size = n_samples // self.n_splits
      indices = np.arange(n_samples)

      margin = 0
      for i in range(self.n_splits):
          start = i * k_fold_size
          stop = start + k_fold_size
          mid = int(0.8 * (stop - start)) + start
          yield indices[start: mid],indices[mid + margin: stop]

然后您将继续创建自己的模型:

def build_model(hp):
   pass

最后,您可以将 CVtuner 创建为一个类,该类将回调 BlockingTimeSeriesSplit。

class CVTuner(kt.engine.tuner.Tuner):
    def run_trial(self,trial,x,*args,**kwargs):
        cv = BlockingTimeSeriesSplit(n_splits=5)
        val_accuracy_list = []
        batch_size = trial.hyperparameters.Int('batch_size',64,step=8)
        epochs = trial.hyperparameters.Int('epochs',10,100,step=10)

        for train_indices,test_indices in cv.split(x):
            x_train,x_test = x[train_indices],x[test_indices]
            y_train,y_test = y[train_indices],y[test_indices]
            model = self.hypermodel.build(trial.hyperparameters)
            model.fit(x_train,y_train,batch_size=batch_size,epochs=epochs)
            val_loss,val_accuracy,val_auc = model.evaluate(x_test,y_test)
            val_accuracy_list.append(val_accuracy)
        
            self.oracle.update_trial(trial.trial_id,{'val_accuracy': np.mean(val_accuracy_list)})
            self.save_model(trial.trial_id,model)

  
tuner = CVTuner(oracle=kt.oracles.BayesianOptimization(objective='val_accuracy',max_trials=1),hypermodel=create_model)

stop_early = tf.keras.callbacks.EarlyStopping(monitor='accuracy',patience=10)

tuner.search(X,Y,callbacks=[stop_early])

best_model = tuner.get_best_models()[0]

best_model.summary()

best_model.evaluate(x_out_of_sample,y_out_of_sample)