Python - 如何以一定的精度输出概率的 numpy 数组并保持其总和

问题描述

我有一个来自 pytorch softmax 函数的 6 个概率的 numpy 数组。

[0.055709425,0.04365404,0.008613999,0.0022386343,0.0037478858,0.88603604]

我想将所有 6 个浮点数转换为字符串以表示分数输出，并且所有这些都需要四舍五入到一定的精度，比如 4。我使用以下代码来获取输出文本：

','.join(f'{x:.4f}' for x in scores)  # scores is the array above

输出是

0.0557,0.0437,0.0086,0.0022,0.0037,0.8860

总和为 0.9999 而不是 1.0。我有一堆这样的数组，但总和为 0.9999 或 1.0001。所以我的问题是，如何获得总和为 1.0 的输出？我知道这是一个浮点计算问题。我错过了什么，一些舍入操作或一些调整？

非常感谢。

解决方法

您可以四舍五入到小数点后两位，以减少错误：

例如：

import numpy as np
a = np.array([0.055709425,0.04365404,0.008613999,0.0022386343,0.0037478858,0.88603604])
print(sum(a))

输出：

1.0000000241

现在：

new_array = [round(x,2) for x in a]
print(sum(new_array))

输出：

1.0

要优化每个数字的四舍五入方式以使总和等于 1，您可以执行以下操作：

将所有数字相乘，以便将所需的小数移到小数点前，然后将它们全部四舍五入 (int)
看看您缺少多少个单位才能达到所需的总和。我们称之为carry
按四舍五入值的降序对数字进行排序
从该排序列表中选择前 carry 个条目，并将它们加 1，这样现在总和就如所需。
恢复原始顺序并将数字除以再次将其数字移入小数部分。

这是该想法的实现：

def roundall(scores,decimalplaces):
    # Coefficient to multiply with in order to keep the desired number of decimal digits
    coeff = 10**decimalplaces
    # Convert to integers and keep track of the original index 
    #    and the amount that was dropped by flooring
    lst = sorted((int(score * coeff) - score * coeff,i,int(score * coeff)) 
                  for i,score in enumerate(scores))
    # How many units have we lost by truncating?
    carry = -round(sum(tup[0] for tup in lst))
    # Distribute the carry over the numbers having the greatest truncation costs
    return [value / coeff 
            for i,value in sorted((i,value + int(carry > j)) 
                                   for j,(overflow,value) in enumerate(lst))]

对于您的示例，您将如何调用它：

scores = [0.055709425,0.88603604]
result = roundall(scores,4)
print(result)

输出：

[0.0557,0.0437,0.0086,0.0022,0.0038,0.886]

您可以运行线性优化，以最小化错误 e1、e2、e3..e6，使得 (p1+ e1) + (p2+e2) + .. + (p6+e6) = 1

这里 e1 e2.. 可以取值为 0.0001 或 -0.0001

e1、e2、e3 等是错误。 p1,p2,..p6 是您的舍入概率 (p1+e1)、(p2+e2) 等是你的新概率

您可以使用 Python 中的 PuLP 线性编程包来实现这一点

numpy precision probability python pytorch