问题描述
使用tf.data.cache(file)后接model.fit()
时出现以下错误,我不确定为什么会这样。目录中没有lockfile
。
tensorflow.python.framework.errors_impl.AlreadyExistsError: There appears to be a concurrent caching iterator running - cache lockfile already exists ('/tmp/cache/mydataset-train_0.lockfile'). If you are sure no other running TF computations are using this cache prefix,delete the lockfile and re-initialize the iterator. Lockfile contents: Created at: 1601972246
[[node IteratorGetNext (defined at /Users/lzuwei/workspace/train_model.py:132) ]] [Op:__inference_train_function_2847]
Function call stack:
train_function
这是我的数据管道的样子,files_list
具有{f1record}格式的15
文件。 num_parallel_reads
设置为15
ds = tf.data.TFRecordDataset(filenames=files_list,compression_type='GZIP',num_parallel_reads=num_parallel_reads) \
.map(map_fn,num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.cache("/tmp/cache/mydataset-train") \
.shuffle(buffer_size=10*batch_size) \
.batch(batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)
model_merged = modelMHA_tfa() # returns a tf.keras.models.Model
model_merged.fit(
ds,epochs=10,)
def map_fn(data_record):
features = tf.io.parse_single_example(data_record,fc_dataset_schema)
# dd = tf.cast(features['a'],dtype=tf.float32)
X = tf.stack([
tf.cast(features['b'],dtype=tf.float32),tf.cast(features['c'],features['d'],features['e'],features['f'],features['g']
],axis=0
)
Y = tf.stack([
features['h']
],axis=0
)
return X,Y
任何提示和建议都将不胜感激!
解决方法
问题是由于在数据集的model.fit()
之前创建了迭代器。
ds_iter = iter(ds)
x,y = ds_iter.next()
删除此代码后,问题已解决。