连续分布的 scipy.stats 属性 `entropy` 不能手动工作

问题描述

scipy.stats 中的每个连续分布都带有一个计算其微分熵的属性：.entropy。与正态分布 (norm) 和其他具有封闭形式熵解的分布不同，其他分布必须依赖于数值积分。

试图找出 .entropy 属性在这些情况下调用哪个函数，我在 _entropy 中找到了一个名为 scipy.stats._distn_infrastructure.py 的函数，它使用 integrate.quad(pdf)（数值积分).

但是当我尝试比较这两种方法（属性 .entropy 与函数 _entropy 的数值积分）时，该函数给出了一个错误：

AttributeError: 'rv_frozen' object has no attribute '_pdf'

为什么分布的属性 .entropy 计算得很好，但函数 _entropy 却报错？

import numpy as np
from scipy import integrate 
from scipy.stats import norm,johnsonsu
from scipy.special import entr

def _entropy(self,*args): #from _distn_infrastructure.py
    def integ(x):
        val = self._pdf(x,*args)
        return entr(val)

    # upper limit is often inf,so suppress warnings when integrating
    # _a,_b = self._get_support(*args)
    _a,_b = -np.inf,np.inf   
    with np.errstate(over='ignore'):
        h = integrate.quad(integ,_a,_b)[0]

    if not np.isnan(h):
        return h
    else:
        # try with different limits if integration problems
        low,upp = self.ppf([1e-10,1. - 1e-10],*args)
        if np.isinf(_b):
            upper = upp
        else:
            upper = _b
        if np.isinf(_a):
            lower = low
        else:
            lower = _a
    return integrate.quad(integ,lower,upper)[0]

使用该属性可以正常工作：

print(johnsonsu(a=2.55,b=2.55).entropy())

返回 0.9503703091220894

但函数没有：

print(_entropy(johnsonsu(a=2.55,b=2.55)))

返回错误 AttributeError: 'rv_frozen' object has no attribute '_pdf'，即使johnsonsu does have this attribute：

def _pdf(self,x,a,b):
    # johnsonsu.pdf(x,b) = b / sqrt(x**2 + 1) *
    #                          phi(a + b * log(x + sqrt(x**2 + 1)))
    x2 = x*x
    trm = _norm_pdf(a + b * np.log(x + np.sqrt(x2+1)))
    return b*1.0/np.sqrt(x2+1.0)*trm

在 .entropy 的情况下，属性 johnsonsu 调用哪个函数？

解决方法

如果您使用冻结发行版，则需要 johnsonsu(a=2.55,b=2.55).entropy()，否则需要 johnsonsu.entropy(a=2.55,b=2.55)。

问题的为什么部分基本上是 _entropy 中的前导下划线表示“实现细节，不要直接调用”。更长的答案是冻结的发行版包装了一个发行版实例（self.dist），并将对_pdf、_pmf 等的调用委托给它。

编辑：执行 johnsonsu(a=2.55,b=2.55) 创建一个冻结分布，rv_frozen。除非您想多次重用实例，否则不要这样做：只需将 a,b 形状参数作为熵函数的参数即可。

entropy information-theory probability-distribution python scipy.stats