在TensorFlow中手动设置梯度值，并将其用于反向传播

问题描述

为了确保这不是XY问题，我将描述这种情况：

我正在用keras / TensorFlow构建一个NN，我想使用的损失函数似乎与TF不可区分。将其包装在tf.py_function中是行不通的，因为渐变都是None。这种损失并非微不足道，它是在完全不同的框架中编写的。我知道最直接的方法是使用tf函数重写损失，但这是不可行的（至少现在是这样）。

网络的最后一层是具有softmax 输出tf.keras.layers.Dense(n_labels,activation='softmax')的完全连接层。通过其他方式（不是tf），我可以获得损耗w.r.t的（数字）梯度。该层的输出。所以这给了我一个主意：是否可以在训练阶段手动设置此梯度，然后让Tensorflow将其传播到网络的其余部分以更新权重？至少在我看来，这将规避不可微分损失的问题，但我不清楚损失是否会得到优化或如何编码。

非常感谢

解决方法

您实际上并没有提供太多细节，但是作为一个总体思路，您可以使用tf.custom_gradient和tf.numpy_function或tf.py_function从TensorFlow中计算出操作的值和梯度（以一些开销为代价，再加上这些功能的文档中所述的限制）。例如，如下所示：

import tensorflow as tf
import numpy as np

# Some operation that you can only compute with NumPy
def my_operation_np(x):
    return np.square(x)

# The gradient of the operation computed with NumPy too
def my_operation_grad_np(x,y,dy):
    # In this example you could also pass only `x` here and
    # do the `* dy` bit in the TensorFlow gradient function.
    # That might reduce the amount of memory transfer between
    # TensorFlow and NumPy.
    return np.multiply(2,x) * dy

# TensorFlow wrapper for the operation
@tf.custom_gradient
def my_operation(x):
    y = tf.numpy_function(my_operation_np,[x],x.dtype)
    def grad(dy):
        return tf.numpy_function(my_operation_grad_np,[x,dy],x.dtype)
    return y,grad

# Test
with tf.GradientTape() as tape:
    x = tf.constant([1.,2.,3.])
    tape.watch(x)
    y = my_operation(x)
g = tape.gradient(y,x)
tf.print(g)
# [2 4 6]

backpropagation keras loss-function python tensorflow