同时在喀拉拉邦训练神经网络,并让他们在训练时共同分担损失吗?

问题描述

假设我要同时训练三个模型(模型1,模型2和模型3),而在训练时,模型1和模型2与主网络(模型1)共同承担损失。因此,主模型可以从层之间的其他两个模型中学习表示形式。

总损失=(权重1)损失m1 +(权重2)(损失m1-损失m2)+(权重3)(损失m1-损失m3)

到目前为止,我有以下内容

def threemodel(num_nodes,num_class,w1,w2,w3):
    #w1; w2; w3 are loss weights
    
    in1 = Input((6373,))
    enc1 = Dense(num_nodes)(in1)
    enc1 = Dropout(0.3)(enc1)
    enc1 = Dense(num_nodes,activation='relu')(enc1)
    enc1 = Dropout(0.3)(enc1)
    enc1 = Dense(num_nodes,activation='relu')(enc1)
    out1 = Dense(units=num_class,activation='softmax')(enc1)
    
    in2 = Input((512,))
    enc2 = Dense(num_nodes,activation='relu')(in2)
    enc2 = Dense(num_nodes,activation='relu')(enc2)    
    out2 = Dense(units=num_class,activation='softmax')(enc2)
    
    in3 = Input((768,))
    enc3 = Dense(num_nodes,activation='relu')(in3)
    enc3 = Dense(num_nodes,activation='relu')(enc3)    
    out3 = Dense(units=num_class,activation='softmax')(enc3)
    
    adam = Adam(lr=0.0001)

    
    model = Model(inputs=[in1,in2,in3],outputs=[out1,out2,out3])
    
    model.compile(loss='categorical_crossentropy',#continu together
          optimizer='adam',metrics=['accuracy'] not sure kNow what changes need to be made here)


## I am confused on how to formulate the shared losses equation here to share the losses of out2 and out3 with out1.

稍作搜索后,似乎可以执行以下操作:

loss_1 = tf.keras.losses.categorical_crossentropy(y_true_1,out1)  
loss_2 = tf.keras.losses.categorical_crossentropy(y_true_2,out2)  
loss_3 = tf.keras.losses.categorical_crossentropy(y_true_3,out3)  

model.add_loss((w1)*loss_1 + (w2)*(loss_1 - loss_2) + (w3)*(loss_1 - loss_3))

这行得通吗?我觉得通过做上面建议的事情并没有真正让我想要做的是让主模型(mod1)从层之间的其他两个模型(mod2和mod3)中学习表示形式。 有什么建议吗?

解决方法

由于您对使用可训练的权重不感兴趣(我将它们标记为系数以将其与可训练的权重区分开),因此可以连接输出并将它们作为单个输出传递给自定义损失函数。这意味着这些系数将在训练开始时可用。

您应该提供上述的自定义损失功能。损失函数应该仅接受2个参数,因此您应该使用categorical_crossentropy这样的函数,该函数也应该熟悉您感兴趣的参数,例如coeffsnum_class。因此,我使用所需的参数实例化了包装函数,然后将内部实际损失函数作为主要损失函数进行传递。

from tensorflow.keras.layers import Dense,Dropout,Input,Concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model

from tensorflow.python.framework import ops
from tensorflow.python.framework import smart_cond
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import array_ops
from tensorflow.python.keras import backend as K


def categorical_crossentropy_base(coeffs,num_class):

    def categorical_crossentropy(y_true,y_pred,from_logits=False,label_smoothing=0):
        """Computes the categorical crossentropy loss.
      Args:
        y_true: tensor of true targets.
        y_pred: tensor of predicted targets.
        from_logits: Whether `y_pred` is expected to be a logits tensor. By default,we assume that `y_pred` encodes a probability distribution.
        label_smoothing: Float in [0,1]. If > `0` then smooth the labels.
      Returns:
        Categorical crossentropy loss value.
        https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/keras/losses.py#L938-L966
      """
        y_pred1 = y_pred[:,:num_class]  # the 1st prediction
        y_pred2 = y_pred[:,num_class:2*num_class]  # the 2nd prediction
        y_pred3 = y_pred[:,2*num_class:]  # the 3rd prediction

        # you should adapt the ground truth to contain all 3 ground truth of course
        y_true1 = y_true[:,:num_class]  # the 1st gt
        y_true2 = y_true[:,num_class:2*num_class]  # the 2nd gt
        y_true3 = y_true[:,2*num_class:]  # the 3rd gt

        loss1 = K.categorical_crossentropy(y_true1,y_pred1,from_logits=from_logits)
        loss2 = K.categorical_crossentropy(y_true2,y_pred2,from_logits=from_logits)
        loss3 = K.categorical_crossentropy(y_true3,y_pred3,from_logits=from_logits)

        # combine the losses the way you like it
        total_loss = coeffs[0]*loss1 + coeffs[1]*(loss1 - loss2) + coeffs[2]*(loss2 - loss3)
        return total_loss

    return categorical_crossentropy

in1 = Input((6373,))
enc1 = Dense(num_nodes)(in1)
enc1 = Dropout(0.3)(enc1)
enc1 = Dense(num_nodes,activation='relu')(enc1)
enc1 = Dropout(0.3)(enc1)
enc1 = Dense(num_nodes,activation='relu')(enc1)
out1 = Dense(units=num_class,activation='softmax')(enc1)

in2 = Input((512,))
enc2 = Dense(num_nodes,activation='relu')(in2)
enc2 = Dense(num_nodes,activation='relu')(enc2)
out2 = Dense(units=num_class,activation='softmax')(enc2)

in3 = Input((768,))
enc3 = Dense(num_nodes,activation='relu')(in3)
enc3 = Dense(num_nodes,activation='relu')(enc3)
out3 = Dense(units=num_class,activation='softmax')(enc3)

adam = Adam(lr=0.0001)

total_out = Concatenate(axis=1)([out1,out2,out3])
model = Model(inputs=[in1,in2,in3],outputs=[total_out])

coeffs = [1,1,1]
model.compile(loss=categorical_crossentropy_base(coeffs=coeffs,num_class=num_class),optimizer='adam',metrics=['accuracy'])

我不确定准确性的指标。但我认为它无需其他更改即可工作。我也在使用K.categorical_crossentropy,但是您当然也可以通过其他实现自由更改它。