了解小批量的渐变胶带

问题描述

在以下取自Keras documentation的示例中,我想了解grads的计算方式。梯度grads是否对应于使用批次(x_batch_train,y_batch_train)计算的平均梯度?换句话说,该算法是否使用小批量中的每个样本针对每个变量计算梯度,然后对其求平均值以获得grads

for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step,(x_batch_train,y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass,which enables auto-differentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train,training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train,logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value,model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads,model.trainable_weights))

解决方法

默认值为export { Application } from "https://deno.land/x/abc@v1.2.1/mod.ts";

阅读this

,

您的假设是正确的。

DachuanZhao提供的文档还显示,该批次中的元素总和是平均的。