没有为任何变量提供梯度-根据Softmax输出,具有随机权重的自定义损失函数

问题描述

我很难编写一个自定义损失函数,该函数利用根据softmax输出预测的类/状态生成的一些随机权重。所需的属性是:

  • 该模型是一个简单的前馈神经网络,输入维为1,输出维为6。
  • 输出层的激活函数softmax,它打算使用Argmax估计类或状态的实际数量
  • 请注意,训练数据仅包含X(没有Y)。
  • 损失函数是根据每个输入样本X的预测状态数根据随机权重(即Weibull分布)定义的。

如下所示,我提供了一个最小的示例进行说明。为了简化起见,我仅基于state / class-1的随机权重定义损失函数。我得到:“ ValueError:没有为任何变量提供渐变:['dense_41 / kernel:0','dense_41 / bias:0','dense_42 / kernel:0','dense_42 / bias:0']。” >

如下面的文章所述,我发现argmax不可区分,并且softargmax函数会有所帮助(正如我在以下代码中实现的那样)。但是,我仍然遇到相同的错误Getting around tf.argmax which is not differentiable

import sys
import time
from tqdm import tqdm
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
from scipy.stats import weibull_min

###############################################################################################
# Generate Dataset
lb  = np.array([2.0])   # Left boundary
ub  = np.array([100.0])  # Right boundary
# Data Points - uniformly distributed
N_r = 50
X_r = np.linspace(lb,ub,N_r)    
###############################################################################################
#Define Model
class DGM:
    # Initialize the class
    def __init__(self,X_r): 
        #normalize training input data
        self.Xmean,self.Xstd = np.mean(X_r),np.std(X_r)
        X_r = (X_r - self.Xmean) / self.Xstd
        self.X_r = X_r
        #Input and output variable dimensions
        self.X_dim = 1; self.Y_dim = 6
        # Define tensors
        self.X_r_tf = tf.convert_to_tensor(X_r,dtype=tf.float32)
        #Learning rate
        self.LEARNING_RATE=1e-4
        #Feedforward neural network model
        self.modelTest = self.test_model()
    ###############################################
    # Initialize network weights and biases 
    def test_model(self):
        input_shape = self.X_dim
        dimensionality = self.Y_dim
        model = tf.keras.Sequential()
        model.add(layers.Input(shape=input_shape))
        model.add(layers.Dense(64,kernel_initializer='glorot_uniform',bias_initializer='zeros'))
        model.add(layers.Activation('tanh'))
        model.add(layers.Dense(dimensionality))
        model.add(layers.Activation('softmax'))
        return model
    ##############################################        
    def compute_loss(self):
        #Define optimizer
        gen_opt = tf.keras.optimizers.Adam(lr=self.LEARNING_RATE,beta_1=0.0,beta_2=0.9)
        with tf.GradientTape() as test_tape:
            ###### calculate loss
            generated_u = self.modelTest(self.X_r_tf,training=True)
            #number of data
            n_data = generated_u.shape[0] 
            #initialize random weights assuming state-1 at all input samples
            wt1 = np.zeros((n_data,1),dtype=np.float32) #initialize weights
            for b in range(n_data):
                wt1[b] = weibull_min.rvs(c=2,loc=0,scale =4,size=1)   
            wt1 =  tf.reshape(tf.convert_to_tensor(wt1,dtype=tf.float32),shape=(n_data,1))
            #print('-----------sampling done-----------')  
            #determine the actual state using softargmax
            idst = self.softargmax(generated_u)
            idst = tf.reshape(tf.cast(idst,tf.float32),1))
            #index state-1
            id1 = tf.constant(0.,dtype=tf.float32)
            #assign weights if predicted state is state-1
            wt1_final = tf.cast(tf.equal(idst,id1),dtype=tf.float32)*wt1
            #final loss
            test_loss = tf.reduce_mean(tf.square(wt1_final)) 
            #print('-----------test loss calcuated-----------')

        gradients_of_modelTest = test_tape.gradient(test_loss,[self.modelTest.trainable_variables])

        gen_opt.apply_gradients(zip(gradients_of_modelTest[0],self.modelTest.trainable_variables))

        return test_loss
#reference: Getting around tf.argmax which is not differentiable
#https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable
    def softargmax(self,x,beta=1e10):
        x = tf.convert_to_tensor(x)
        x_range = tf.range(x.shape.as_list()[-1],dtype=x.dtype)
        return tf.reduce_sum(tf.nn.softmax(x*beta,axis=1) * x_range,axis=-1)

    ##############################################
    def train(self,training_steps=100):
        train_start_time = time.time()
        for step in tqdm(range(training_steps),desc='Training'):
            start = time.time()
            test_loss = self.compute_loss()          

            if (step + 1) % 10 == 0:
                elapsed_time = time.time() - train_start_time
                sec_per_step = elapsed_time / step
                mins_left = ((training_steps - step) * sec_per_step)
                tf.print("\nStep # ",step,"/",training_steps,output_stream=sys.stdout)
                tf.print("Current time:",elapsed_time," time left:",mins_left,output_stream=sys.stdout)
                tf.print("Test Loss: ",test_loss,output_stream=sys.stdout)
###############################################################################################
#Define and train the model
model = DGM(X_r)
model.train(training_steps=100)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...