神经网络没有使用三重态损失学习任何有意义的东西

问题描述

我正在为kaggle比赛开发基于三重态损失的模型。

简短说明-在这项竞赛中，我们面临着挑战，即通过分析一个包含来自研究机构和公共贡献者的包含25,000多幅图像的数据库，来构建一种识别图像中鲸鱼的算法。

https://www.kaggle.com/c/humpback-whale-identification?rvi=1

我决定使用暹罗网络体系结构并对其进行训练，以给我提供编码，然后将其用于计算两幅鲸鱼图片之间的距离。如果该距离低于特定阈值，则两个图片属于同一条鲸鱼；如果此距离更大，则它们不是同一条鲸鱼。

这是我使用过的Triplet损失函数（可从Andrew的深度学习专业中学习），但我也对编码进行了规范化处理，以使损失函数在不同模型之间更具可解释性（易于确定余量和分割点）（如果有意义）（首先，在没有规范化的情况下进行了尝试，当它不起作用时，我尝试进行了规范化。）我还尝试了更改alpha（margin）并将其从0.2更改为0.6。

from tensorflow.nn import l2_normalize as norm_l2

def triplet_loss(y_true,y_pred,alpha = 0.3):
    """
    Arguments:
    y_true -- true labels,required when you define a loss in Keras,you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images,of shape (None,128)
            positive -- the encodings for the positive images,128)
            negative -- the encodings for the negative images,128)
    
    Returns:
    loss -- real number,value of the loss
    """
    
    anchor,positive,negative = y_pred[0],y_pred[1],y_pred[2]
    anchor,negative = norm_l2(anchor),norm_l2(positive),norm_l2(negative)

    # Step 1: Compute the (encoding) distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)),axis = -1)
    # Step 2: Compute the (encoding) distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)),axis = -1)
    # Step 3: subtract the two prevIoUs distances and add alpha.
    basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss,0.0))
  
    return loss

这是我尝试过的一种模型架构的示例。到目前为止，我一直尝试使用预先训练的Facenet，resnet，DenseNet和Xception。我尝试过冻结每个层中不同数量的层。

R = tf.keras.applications.resnet50(include_top=False,weights = 'imagenet',input_shape=(224,224,3))

lr = 0.0001
optimizer = Adam(learning_rate=lr)
R.compile(optimizer=optimizer,loss = triplet_loss)

for layer in R.layers[0:30]:
    layer.trainable = False

em_Rmodel = Sequential([
                          R,GlobalAveragePooling2D(),#tf.keras.layers.GlobalMaxPooling2D(),Dense(512,activation='relu'),bn(),Dense(256,activation = 'sigmoid'),Dense(128,activation = 'sigmoid')
                          ])

def make_tripletModel(model):

    #I was manually changing the input shape to fit the default shape of pretrained networks
    A = Input(shape = (224,3),name='anchor')
    P = Input(shape = (224,name = 'anchorPositive')
    N = Input(shape = (224,name = 'anchorNegative')

    enc_A = model(A)
    enc_P = model(P)
    enc_N = model(N)

    tripletModel = Model(inputs=[A,P,N],outputs=[enc_A,enc_P,enc_N])
    return tripletModel

tripletModel = make_tripletModel(em_Rmodel)

我一直在使用半硬三元组进行训练，也一直在适当地扩充数据以生成更多训练图像。

这是我用于训练的批处理生成器。 crop_batch是一项功能，可将图像裁剪以仅显示鲸鱼的尾巴，从而可以识别鲸鱼。它使用了DenseNet，该DenseNet可以对1000多幅图像进行训练，这些图像带有鲸鱼的尾巴和包围它的边界框。做得好吗？

def batch_generator_RN(batch_size = batch_size,ishape = (256,256,model_input_shape = (224,3)):
    triplet_generator = get_triplets()
    y_val = np.zeros((batch_size,2,1))
    anchors = np.zeros((batch_size,ishape[0],ishape[1],ishape[2]))
    positives = np.zeros((batch_size,ishape[2]))
    negatives = np.zeros((batch_size,ishape[2]))

    while True:        
        for i in range(batch_size):
            anchors[i],positives[i],negatives[i] = next(triplet_generator)
        
        anc = crop_batch(anchors,batch_size= batch_size,img_shape=model_input_shape)
        pos = crop_batch(positives,img_shape=model_input_shape)
        neg = crop_batch(negatives,img_shape=model_input_shape)

        x_data = {'anchor': anc,'anchorPositive': pos,'anchorNegative': neg
                  }

        yield (x_data,[y_val,y_val,y_val])

最后，总的来说，这就是我一直试图训练这些模型的方式。我尝试降低和提高学习率，batch_size = 16。

lr = 0.0001
optimizer = Adam(learning_rate=lr)
tripletModel.compile(optimizer = optimizer,loss = triplet_loss)


es = EarlyStopping(monitor='loss',patience=20,min_delta=0.05,restore_best_weights=True)
#mc = ModelCheckpoint('Rmodel.h5',monitor='loss',save_best_only=True,save_weights_only=True)
rlr = ReduceLROnPlateau(monitor='loss',factor = 0.1,patience = 5,verbose = 1,min_lr = 0)

gen = batch_generator(batch_size)
tripletModel.fit(gen,steps_per_epoch=64,epochs = 40,callbacks=[es,rlr])

因此，在训练完所有这些模型之后，在某些模型中，三重态损失确实会下降一段时间，但随后趋于平稳，并且基本上没有学到任何有意义的信息（这基本上意味着，仅通过查看两个嵌入之间的距离，我就无法知道（如果它们是同一条鲸）。在其他模型中，在第一个或第二个时期之后，权重会立即收敛，并且根本不会改变，也不会学习任何东西。

我尝试了各种各样的学习率，我很确定这不是问题。

请告诉我是否应添加所有代码文件，以便您更好地理解问题。我之所以没有这样做，是因为我还没有清理它，但是如果需要的话，我们很乐意这样做。谢谢。

解决方法

当您说它没有学到任何东西时，是损失达到了平稳状态，从而停止了下降，还是确实显着下降了，但是当您预测相同和不同鲸鱼的嵌入物的价值相似时？

Tokenizer fn和triples_loss() fn是正确的，该问题与数据生成无关。

但是，我怀疑在冻结很多层时，您的学习率过高，即冻结了许多可训练的参数，这可能导致您的网络无法收敛。

我的建议是取消所有层的冻结并将学习率降低至0.00001，然后再次开始训练，无论使用哪种架构（Xception / ResNet等）

cnn deep-learning face-recognition keras keras tensorflow tensorflow tensorflow