问题描述
我制作了一个卷积网络,其中某些层具有膨胀,训练时出现以下错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: padded_shape[0]=170 is not divisible by block_shape[0]=4
该错误发生在称为编码器_5的层上,该层是首次应用膨胀。当我在该层中使用padding="valid"
时不会出现该错误,但这不是一个选择,因为我需要维护尺寸以便能够在以后的阶段进行连接。我不知道此错误来自何处,似乎填充无法正常工作。
该错误与此处的错误相同:https://github.com/tensorflow/tensorflow/issues/28788,但是我可以在错误报告中运行代码而没有任何问题,因此我怀疑这是tf.keras中的错误。我在做什么错了?
我正在使用tensorflow 2.2.0。
完整堆栈跟踪:
(4,160,90,4096)
(4,4096)
2020-08-14 16:01:25.577940: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 1 of 4
(4,4096)
2020-08-14 16:02:01.700859: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 2 of 4
(4,4096)
2020-08-14 16:02:38.701768: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 3 of 4
(2,4096)
2020-08-14 16:02:46.407833: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:193] Shuffle buffer filled.
(2,4096)
(2,4096)
Train for 4 steps,validate for 1 steps
Epoch 1/500
(4,4096)
2020-08-14 16:03:42.476405: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 1 of 4
(4,4096)
2020-08-14 16:04:18.252608: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 2 of 4
(4,4096)
2020-08-14 16:04:52.134605: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 3 of 4
(2,4096)
2020-08-14 16:05:11.404984: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:193] Shuffle buffer filled.
2020-08-14 16:17:53.293766: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES Failed at spacetobatch_op.cc:219 : Invalid argument: padded_shape[0]=170 is not divisible by block_shape[0]=4
2020-08-14 16:17:53.302512: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: padded_shape[0]=170 is not divisible by block_shape[0]=4
[[{{node cutie/conv2d_5/SpacetoBatchND}}]]
1/4 [======>.......................] - ETA: 44:43WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
2020-08-14 16:17:54 WARNING tensorflow Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
Traceback (most recent call last):
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/scripts/train_locally.py",line 53,in <module>
train_and_save_model(training_params,model_Meta)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/scripts/train_locally.py",line 33,in train_and_save_model
train_and_save_grid_model(pipeline_folder)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/train_pipeline/step_3_model_training/train_cutie_model.py",line 42,in train_and_save_grid_model
test_data_gen = train_model(model,training_params,model_Meta)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/training/training_utils.py",line 45,in train_model
verbose=True)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py",line 728,in fit
use_multiprocessing=use_multiprocessing)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py",line 324,in fit
total_epochs=epochs)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py",line 123,in run_one_epoch
batch_outs = execution_function(iterator)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py",line 86,in execution_function
distributed_function(input_fn))
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py",line 457,in __call__
result = self._call(*args,**kwds)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py",line 520,in _call
return self._stateless_fn(*args,**kwds)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py",line 1823,in __call__
return graph_function._filtered_call(args,kwargs) # pylint: disable=protected-access
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py",line 1141,in _filtered_call
self.captured_inputs)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py",line 1224,in _call_flat
ctx,args,cancellation_manager=cancellation_manager)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py",line 511,in call
ctx=ctx)
File "/Users/aleksandra/Projects/NLP/nlp-entity-extraction/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py",line 67,in quick_execute
six.raise_from(core._status_to_exception(e.code,message),None)
File "<string>",line 3,in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: padded_shape[0]=170 is not divisible by block_shape[0]=4
[[node cutie/conv2d_5/SpacetoBatchND (defined at /venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_2737]
型号:
import tensorflow as tf
def model(labels):
embeddings = tf.keras.Input(
shape=(160,4096),dtype=tf.float32,name="embedding_grid",)
# encoder
n_filters = 4096 // 2
encoder_1 = tf.keras.layers.Conv2D(
filters=n_filters,kernel_size=[3,5],padding="same",use_bias=True,activation="relu",)(embeddings)
encoder_2 = tf.keras.layers.Conv2D(
filters=n_filters,)(encoder_1)
encoder_3 = tf.keras.layers.Conv2D(
filters=n_filters,)(encoder_2)
encoder_4 = tf.keras.layers.Conv2D(
filters=n_filters,)(encoder_3)
encoder_5 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=(2,2),)(encoder_4)
encoder_6 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=(4,4),)(encoder_5)
encoder_7 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=(8,8),)(encoder_6)
encoder_8 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=(16,16),)(encoder_7)
# Atrous Spatial Pyramid Pooling module
aspp_1 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=4,)(encoder_8)
aspp_2 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=8,)(encoder_8)
aspp_3 = tf.keras.layers.Conv2D(
filters=n_filters,dilation_rate=16,)(encoder_8)
reduced = tf.reduce_mean(aspp_3,[1,2],keepdims=True)
global_pool = tf.image.resize(reduced,[tf.shape(aspp_3)[1],tf.shape(aspp_3)[2]],method='nearest',name="global_pool")
# global_pool = tf.keras.layers.GlobalMaxPool2D()(encoder_8)
aspp_concat = tf.concat([aspp_1,aspp_2,aspp_3,global_pool],axis=3)
aspp_1x1 = tf.keras.layers.Conv2D(
filters=n_filters,kernel_size=[1,1],)(aspp_concat)
# combine low level features
concat = tf.concat([encoder_1,aspp_1x1],axis=3)
decoder = tf.keras.layers.Conv2D(
filters=64,)(concat)
# classification
logits = tf.keras.layers.Conv2D(
filters=len(labels),name="logits",)(decoder)
softmax = tf.keras.layers.softmax(axis=3,name="softmax")(logits)
pred_ids = tf.argmax(softmax,axis=2,output_type=tf.int32)
loss = {"logits": "categorical_crossentropy"}
model = tf.keras.Model(inputs=[embeddings],outputs=[logits,pred_ids],name="cutie")
model.compile(
optimizer=tf.keras.optimizers.Adam(
learning_rate=0.0001,decay=0.0001 / 500
),loss=loss,metrics={"logits": [tf.keras.metrics.CategoricalAccuracy()]},)
解决方法
事实证明,输入数据的维数并不完全相同...