使用Colab在GPU上运行联邦TensorFlow

问题描述

是否可以使用Colab提供的GPU 来更快地运行TFF的培训课程？训练联合模型需要1个多小时，而且使用GPU运行时似乎根本没有任何好处。

“高性能仿真”的“ TFF”页面仍然为空，我找不到任何将GPU与TFF一起使用的指南。

有什么建议吗？谢谢！

tf和tff版本：

export class Child extends React.Component {
  render() {
    return <h1>{this.props.data}</h1>;
  }
}

每轮客户数量：

2.4.0-dev20200917 
0.16.1

输入数据元素规范：

类似于我正在处理位置序列的文本生成教程，该模型非常相似：

OrderedDict([('x',OrderedDict([('start_place',TensorSpec(shape=(8,8),dtype=tf.int32,name=None)),('start_hour_sin',dtype=tf.float64,('start_hour_cos',('week_day_sin',('week_day_cos',('weekend',('month',name=None))])),('y',name=None))])

创建模型的功能：

    # Create a model
def create_keras_model(number_of_places,batch_size):
  
        # Shortcut to the layers package
  l = tf.keras.layers


  # Now we need to define an input dictionary.
    # Where the keys are the column names
    # This is a model with multiple inputs,so we need to declare and input layer for each feature
  feature_inputs = {
    'start_hour_sin': tf.keras.Input((N-1,),batch_size=batch_size,name='start_hour_sin'),'start_hour_cos': tf.keras.Input((N-1,name='start_hour_cos'),'weekend': tf.keras.Input((N-1,name='weekend'),'week_day_sin': tf.keras.Input((N-1,name='week_day_sin'),'week_day_cos': tf.keras.Input((N-1,name='week_day_cos'),}

  
  # We cannot use anarray of features as always because we have sequences and we cannot match the shape otherwise
  # We have to do one by one
  start_hour_sin = feature_column.numeric_column("start_hour_sin",shape=(N-1))
  hour_sin_feature = l.DenseFeatures(start_hour_sin)(feature_inputs)

  start_hour_cos = feature_column.numeric_column("start_hour_cos",shape=(N-1))
  hour_cos_feature = l.DenseFeatures(start_hour_cos)(feature_inputs)

  weekend = feature_column.numeric_column("weekend",shape=(N-1))
  weekend_feature = l.DenseFeatures(weekend)(feature_inputs)
  
  week_day_sin = feature_column.numeric_column("week_day_sin",shape=(N-1))
  week_day_sin_feature = l.DenseFeatures(week_day_sin)(feature_inputs)

  week_day_cos = feature_column.numeric_column("week_day_cos",shape=(N-1))
  week_day_cos_feature = l.DenseFeatures(week_day_cos)(feature_inputs)

  
    # We have also to add a dimension to then concatenate
  hour_sin_feature = tf.expand_dims(hour_sin_feature,-1)
  hour_cos_feature = tf.expand_dims(hour_cos_feature,-1)
  weekend_feature = tf.expand_dims(weekend_feature,-1)
  week_day_sin_feature = tf.expand_dims(week_day_sin_feature,-1)
  week_day_cos_feature = tf.expand_dims(week_day_cos_feature,-1)

  # Declare the dictionary for the places sequence as before
  sequence_input = {
      'start_place': tf.keras.Input((N-1,dtype=tf.dtypes.int32,name='start_place') # add batch_size=batch_size in case of stateful GRU
  }


  # Handling the categorical feature sequence using one-hot
  places_one_hot = feature_column.sequence_categorical_column_with_vocabulary_list(
      'start_place',[i for i in range(number_of_places)])
  
  # Embed the one-hot encoding
  places_embed = feature_column.embedding_column(places_one_hot,embedding_dim)


  # With an input sequence we can't use the DenseFeature layer,we need to use the SequenceFeatures
  sequence_features,sequence_length = tf.keras.experimental.SequenceFeatures(places_embed)(sequence_input)

  input_sequence = l.Concatenate(axis=2)([ sequence_features,hour_sin_feature,hour_cos_feature,weekend_feature,week_day_sin_feature,week_day_cos_feature])

  # Rnn
  recurrent = l.GRU(rnn_units,#in case of stateful
                        return_sequences=True,dropout=0.5,stateful=True,recurrent_initializer='glorot_uniform')(input_sequence)


    # Last layer with an output for each places
  dense_1 = layers.Dense(number_of_places)(recurrent)

    # softmax output layer
  output = l.softmax()(dense_1)
    
    # To return the Model,we need to define it's inputs and outputs
    # In out case,we need to list all the input layers we have defined 
  inputs = list(feature_inputs.values()) + list(sequence_input.values())

    # Return the Model
  return tf.keras.Model(inputs=inputs,outputs=output)

联盟平均

def create_tff_model():
  # TFF uses an `input_spec` so it kNows the types and shapes
  # that your model expects.
  input_spec = preprocessed_example_dataset.element_spec
  keras_model_clone = create_keras_model(number_of_places,batch_size=BATCH_SIZE)
  return tff.learning.from_keras_model(
      keras_model_clone,input_spec=input_spec,loss=tf.keras.losses.SparseCategoricalCrossentropy(),

状态初始化：

# This command builds all the TensorFlow graphs and serializes them: 
fed_avg = tff.learning.build_federated_averaging_process(
    model_fn=create_tff_model,client_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.001),server_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.06))
          metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

训练循环：

state = fed_avg.initialize()

解决方法

要注意，此模型每轮执行0个客户* 13个SGD步骤（接近1,000），尽管一个小时似乎仍然很长。一台机器上的70个客户端正在推动模拟的极限，当数量增加时，我们开始使用远程执行器查看多机器设置。

需要调查的一些事情

模拟 I / O 是否受约束？ Python环境可以在单个客户端数据集中进行迭代的速度有多快？在TF for batch in dataset:中，在此处花费时间可能会有用。
模拟 compute 是否受约束？也许要注意CPU和GPU的利用率。在单个客户端数据集上运行keras_model.fit()需要多长时间？ TFF模拟大约每轮执行70倍（每个客户一次）。

google-colaboratory tensorflow tensorflow tensorflow tensorflow-federated

使用Colab在GPU上运行联邦TensorFlow

问题描述

解决方法

相关问答