TensorFlow 2.x：使用嵌入列时无法以h5格式加载经过训练的模型ValueError：形状101、15和57218、15不兼容

问题描述

经过长时间的反复，我设法保存了模型（请参见问题TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))）。但是现在我在加载保存的模型时遇到了问题。首先，我通过加载模型得到以下错误：

ValueError: You are trying to load a weight file containing 1 layers into a model with 0 layers.

将顺序更改为功能API后，出现以下错误：

ValueError: Cannot assign to variable dense_features/NAME1W1_embedding/embedding_weights:0 due to variable shape (101,15) and value shape (57218,15) are incompatible

我尝试了不同版本的TensorFlow。每晚我在版本tf中得到描述的错误。在2.1版中，我有一个非常类似的错误：

ValueError: Shapes (101,15) and (57218,15) are incompatible.

在2.2和2.3版本中，我什至无法保存模型（如上一个问题所述）。

以下是功能性API的相关代码：

def __loadModel(args):
    filepath = args.loadModel

    model = tf.keras.models.load_model(filepath)

    print("start preprocessing...")
    (_,_,test_ds) = preprocessing.getPreProcessedDatasets(args.data,args.batchSize)
    print("preprocessing completed")

    _,accuracy = model.evaluate(test_ds)
    print("Accuracy",accuracy)



def __trainModel(args):
    (train_ds,val_ds,args.batchSize)

    for bucketSizeGEO in args.bucketSizeGEO:
        print("start preprocessing...")
        feature_columns = preprocessing.getFutureColumns(args.data,args.zip,bucketSizeGEO,True)
        #Todo: compare trainable=False to trainable=True
        feature_layer = tf.keras.layers.DenseFeatures(feature_columns,trainable=False)
        print("preprocessing completed")


        feature_layer_inputs = preprocessing.getFeatureLayerInputs()
        feature_layer_outputs = feature_layer(feature_layer_inputs)
        output_layer = tf.keras.layers.Dense(1,activation=tf.nn.sigmoid)(feature_layer_outputs)

        model = tf.keras.Model(inputs=[v for v in feature_layer_inputs.values()],outputs=output_layer)

        model.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['accuracy'])

        paramString = "Arg-e{}-b{}-z{}".format(args.epoch,args.batchSize,bucketSizeGEO)


        log_dir = "logs\\logR\\" + paramString + datetime.datetime.Now().strftime("%Y%m%d-%H%M%s")
        tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir,histogram_freq=1)


        model.fit(train_ds,validation_data=val_ds,epochs=args.epoch,callbacks=[tensorboard_callback])


        model.summary()

        loss,accuracy = model.evaluate(test_ds)
        print("Accuracy",accuracy)

        paramString = paramString + "-a{:.4f}".format(accuracy)

        outputName = "logReg" + datetime.datetime.Now().strftime("%Y%m%d-%H%M%s") + paramString

        

        if args.saveModel:
            for i,w in enumerate(model.weights): print(i,w.name)

            path = './saved_models/' + outputName + '.h5'
            model.save(path,save_format='h5')

有关预处理部分，请参阅此问题开头的提到的问题。 for i,w.name)返回以下内容：

0 dense_features/NAME1W1_embedding/embedding_weights:0
1 dense_features/NAME1W2_embedding/embedding_weights:0
2 dense_features/STREETW_embedding/embedding_weights:0
3 dense_features/ZIP_embedding/embedding_weights:0
4 dense/kernel:0
5 dense/bias:0

解决方法

我的英语不好，所以我用中文回答你的问题。 enter image description here

在Englis中的答案如下：这个问题是由于训练和预测中嵌入矩阵的维数不一致引起的。

通常，在使用嵌入式矩阵之前，我们将形成一个字典。在这里，我们暂时将此字典称为word_index。如果代码的作者不考虑周全，则会导致在训练和预测中使用两个不同的word_index（因为在训练和预测中使用的数据不同），所以emebedding矩阵的维数发生了变化。 / p>

从错误中您可以看到，训练时得到len（word_index）+1为57218，而预测期间获得的len（word_index）+1为101。

如果我们要正确运行代码，则在预测期间需要使用word_index的预测时就无法重新生成word_index。因此，解决此问题的最简单方法是保存训练时获得的word_index，该值在预测时会被调用，以便我们可以正确加载训练中获得的权重。

我能够解决我的愚蠢错误：

我正在使用feature_column库来预处理我的数据。不幸的是，我在函数categorical_column_with_identity的参数num_buckets中指定了一个固定的而不是实际的词汇表大小。版本错误：

street_voc = tf.feature_column.categorical_column_with_identity(
        key='STREETW',num_buckets=100)

正确版本：

street_voc = tf.feature_column.categorical_column_with_identity(
        key='STREETW',num_buckets= __getNumberOfWords(data,'STREETPRO') + 1)

函数__getNumberOfWords(data,'STREETPRO')返回熊猫数据帧的'STREETPRO'列中不同单词的数量。

h5py keras keras machine-learning python tensorflow tensorflow tensorflow