关于Softmax函数作为预测中的输出层

问题描述

我知道softmax激活函数：输出层与softmax激活的总和总是等于1，也就是说：输出向量被归一化，这也是必要的，因为最大累积概率不能超过1。好的，这很清楚。

但我的问题如下：当 softmax 用作分类器时，是使用 argmax 函数来获取类的索引。那么，如果重要参数是获得正确类别的指数，那么获得 1 或更高的累积概率有什么区别？

python 中的一个例子，我制作了另一个 softmax（实际上不是 softmax 函数），但分类器的工作方式与具有真正 softmax 函数的分类器相同：

import numpy as np

classes = 10
classes_list = ['dog','cat','monkey','butterfly','donkey','horse','human','car','table','bottle']

# This simulates and NN with her weights and the prevIoUs 
# layer with a ReLU activation
a = np.random.normal(0,0.5,(classes,512)) # Output from prevIoUs layer
w = np.random.normal(0,(512,1))       # weights
b = np.random.normal(0,1))   # bias

# correct solution:
def softmax(a,w,b):
    a = np.maximum(a,0) # ReLU simulation
    x = np.matmul(a,w) + b
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0),np.argsort(e_x.flatten())[::-1]

# approx solution (probability is upper than one):
def softmax_app(a,0) # ReLU simulation
    w_exp = np.exp(w)
    coef = np.sum(w_exp)
    matmul = np.exp(np.matmul(a,w) + b)
    res = matmul / coef
    return res,np.argsort(res.flatten())[::-1]

teor = softmax(a,b)
approx = softmax_app(a,b)
class_teor = classes_list[teor[-1][0]]
class_approx = classes_list[approx[-1][0]]
print(np.array_equal(teor[-1],approx[-1]))
print(class_teor == class_approx)

两种方法之间获得的类总是相同的（我说的是预测，而不是训练）。我问这个是因为我正在 FPGA 设备中实现 softmax，并且使用第二种方法不需要运行 2 次来计算 softmax 函数：首先找到求幂矩阵及其总和，然后执行除法。

解决方法

让我们回顾一下 softmax 的用法：

如果，您应该使用softmax：
1. 您正在训练一个神经网络，并希望在训练期间限制输出值的范围（您可以改用其他激活函数）。这可以稍微帮助裁剪渐变。
2. 您正在 NN 上执行推理，并且想要获得分类结果的“置信度”指标（范围为 0-1）。
3. 您正在 NN 上执行推理并希望获得 top K 结果。在这种情况下，建议使用“置信度”指标来比较它们。
4. 您正在对多个 NN（集成方法）进行推理，并希望对它们求平均值（否则它们的结果将不容易比较）。
如果：
，您不应使用（或删除）softmax
1. 您正在 NN 上执行推理，并且您只关心顶级。请注意，NN 可以使用 Softmax 进行训练（以获得更好的准确性、更快的收敛速度等）。

就您而言，您的见解是正确的：Softmax 作为最后一层的激活函数是没有意义的，如果您的问题只需要您在推理阶段获得最大值的索引。此外，由于您的目标是 FPGA 实现，这只会给您带来额外的麻烦。

classification deep-learning neural-network softmax