使用 tensorflow 梯度带计算 Hessian

问题描述

感谢您对此问题的关注。

我想计算tensorflow.keras.Model的hessian矩阵

对于高阶导数,我尝试嵌套 GradientTape.# 示例图和输入

xs = tf.constant(tf.random.normal([100,24]))

ex_model = Sequential()
ex_model.add(Input(shape=(24)))
ex_model.add(Dense(10))
ex_model.add(Dense(1))

with tf.GradientTape(persistent=True) as tape:
    tape.watch(xs)
    ys = ex_model(xs)
g = tape.gradient(ys,xs)
h = tape.jacobian(g,xs)
print(g.shape)
print(h.shape)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-dbf443f1ddab> in <module>
      5 h = tape.jacobian(g,xs)
      6 print(g.shape)
----> 7 print(h.shape)

AttributeError: 'nonetype' object has no attribute 'shape'

而且,另一个试验...

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        tape2.watch(xs)
        ys = ex_model(xs)
    g = tape2.gradient(ys,xs)
h = tape1.jacobian(g,xs)
    
print(g.shape)
print(h.shape)


(100,24)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-c5bbb17404bc> in <module>
      7 
      8 print(g.shape)
----> 9 print(h.shape)

AttributeError: 'nonetype' object has no attribute 'shape'

为什么我无法计算 g wrt x 的梯度?

解决方法

您已经计算了 ys 梯度 wrt xs 的第二阶,它为零,这在您计算梯度 wrt 常数时应该是这样,这就是为什么 tape1.jacobian(g,xs) 返回 {{1 }}

当梯度的二阶保持不变时:

None

输出:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input,Dense

x = tf.Variable(1.0)
w = tf.constant(3.0)
with tf.GradientTape() as t2:
  with tf.GradientTape() as t1:
    y = w * x**3
  dy_dx = t1.gradient(y,x)
d2y_dx2 = t2.gradient(dy_dx,x)

print('dy_dx:',dy_dx) # 3 * 3 * x**2 => 9.0
print('d2y_dx2:',d2y_dx2) # 9 * 2 * x => 18.0

当梯度的二阶时:

dy_dx: tf.Tensor(9.0,shape=(),dtype=float32)
d2y_dx2: tf.Tensor(18.0,dtype=float32)

输出:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input,Dense

x = tf.Variable(1.0)
w = tf.constant(3.0)
with tf.GradientTape() as t2:
  with tf.GradientTape() as t1:
    y = w * x
  dy_dx = t1.gradient(y,dy_dx)
print('d2y_dx2:',d2y_dx2)

然而,您可以计算梯度 wrt dy_dx: tf.Tensor(3.0,dtype=float32) d2y_dx2: None 的二阶层参数,例如 Input gradient regularization

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...