蟒蛇：瑞利飞度直方图更新

问题描述

我仍在使用python进行编程。这是我第一次尝试使用直方图和拟合！

特别是，我有一个数据集，并对其做了直方图。在这一点上，我应该进行瑞利拟合，但是我无法找出正确设置参数的正确方法。我读到了loc和scale，应该将fit的参数通常设置为0和1。显然，这种方式的拟合效果不好！！！有没有人可以帮助我？为了清楚起见，我要附加我正在使用的代码。

谢谢。

pipeline.set_params(hyperparameter=1,...)

我的数据（fondi）是：[13 15 13 14 12 13 12 14 15 12 11 10 11 15 18 11 11 11 13 15 15 15 15 12 12 13 12 15 15 15 12 12 11 14 16 11 13 14 16 17 24 21 16 20 18 18 19 21 22 19 15 16 15 13 14 16 18 21 19 22 14 13 14 15 14 17 19 17 16 18 12 15 17 17 16 17 16 19 17 14 13 16 16 13 15 17 17 20 18 17 12 19 14 15 15 14 13 17 16 14 12 11 12 20 19 16 24 19 20 19 17 16 17 16 19 22 17 16 20 22 21 22 20 14 18 16 19 20 17 20 22 20 22 19 17 13 16 18 14 16 20 20 18 19 19 16 19 12 12 14 14 13 15 16 16 19 16 17 12 11 11 10 12 11 11 13 14 13 17 8 8 8 10 10 10 14 16 11 9 9 11 10 17 13 15 19 15 13 16 17 14 12 13 14 11 10 15 13 12 12 11 10 9 9 9 9 8 15 16 12 9 11 9 10 10 7 7 7 21 19 13 10 15 12 10 10 9 8 10 20 14 13 11 13 15 14 10 11 12 16 17 15 12 13 16 15 13 14 17 14 13 15 13 11 14 15 17 18 22 22 16 16 17 22 17 17 18 26 17 19 21 16 15 19 19 22 19 18 17 18 18 12 17 17 17 18 18 14 16 20 17 16 16 18 16 19 18 18 20 20]

输出：loc = 6.783540954380711 scale = 6.430045149216335

解决方法

调整MCVE

下面是一个简单的过程，可从Rayleigh distribution提取试验数据集，然后使用Maximum Likelihood Estimation方法提供的scipy.stats.rv_continuous.fit查找其参数：

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Create a Continuous Variable: 
X = stats.rayleigh(loc=10,scale=5)

# Sample from this Random variable
x0 = X.rvs(size=10000,random_state=123)

# Adjust Distribution parameters
loc,scale = stats.rayleigh.fit(x0) # (9.990726961181025,4.9743913760956335)

# Tabulate over sample range (PDF display):
xl = np.linspace(x0.min(),x0.max(),100)

# Display Results:
fig,axe = plt.subplots()
axe.hist(x0,density=1,label="Sample")
axe.plot(xl,X.pdf(xl),label="Exact Distribution")
axe.plot(xl,stats.rayleigh(scale=scale,loc=loc).pdf(xl),label="Adjusted Distribution")
axe.set_title("Distribution Fit")
axe.set_xlabel("Variable,$x$ $[\mathrm{AU}]$")
axe.set_ylabel("Density,$f(x)$ $[\mathrm{AU}^{-1}]$")
axe.legend()
axe.grid()

它呈现如下：

注释

我想提请您注意以下要点：

300对于直方图分类箱来说是一个巨大的数字，因为您将拥有空的或低填充的分类箱，这会降低表示的质量。由于代表下的垃圾箱，它也可能使统计检验（例如“卡方拟合度”）失败。您当然可以让matplotlib估算垃圾箱的数量；
分布通常采用位置和比例参数，在scipy.stats中，它们会尽一切可能以这种方式对每个可用分布进行规范化。要找出与usual parametric distribution definition的对应关系，您需要解决以下问题：pdf(x) = pdf(y)/scale其中y = (x-loc)/scale。在这种情况下，您将看到scale参数等效于sigma，并且这对于原点偏移是不变的（不依赖于loc值）；
要调整分布，您需要在某些时候执行一些分析/统计过程以从采样数据中估计参数。您的代码中缺少此部分（请参见上面的MCVE中的stats.rayleigh.fit(x0)）。这部分独立于matplotlib绘制的任何图形，由scipy处理，该图形在完整数据集上执行MLE（这就是为什么更改bin只会影响直方图显示而没有其他影响的原因。） / li>

更新

根据您的帖子更新，我完成了我的回答。使用您提供的数据集：

x0 = np.array([13,15,13,14,12,11,10,18,16,17,24,21,20,19,22,8,9,7,26,18])

我们可以尝试调整瑞利分布：

p = stats.rayleigh.fit(x0)
X = stats.rayleigh(*p)

从视觉上看，合身度不是很好：

让我们通过统计测试进行确认。首先，我们可以使用ECDF Kolmogorov-Smirnov检查Test是否与调整后的分布的CDF兼容：

kst = stats.kstest(x0,X.cdf)
# KstestResult(statistic=0.12701044409231593,pvalue=0.0001232197856051324)

我们还可以评估调整后的分布的预期计数，并使用Chi Square Test将它们与有礼貌的计数进行比较：

c,b = np.histogram(x0)
ct = np.diff(X.cdf(b))*np.sum(c)
c2t = stats.chisquare(c,ct,ddof=2)
# Power_divergenceResult(statistic=31.874916914227434,pvalue=4.284273564311872e-05)

自由度的差等于2，因为除了卡方统计量外，我们还必须估计瑞利分布的loc和scale参数（因此在测试中为ddof=2致电）。

这两个测试的p-value都非常低且相似，这意味着很难满足零假设（因此告诉我们应该拒绝它们）：

Kolmogorov： H0 =样本是从参考分布中提取的；
卡方： H0 =类别在观察分布和预期分布之间没有差异；

那么很难相信您的数据集来自调整后的瑞利分布。

您可以将这些结果与MCVE中绘制的综合数据进行比较，测试返回的p值会超过10％：

# KstestResult(statistic=0.0097140857969642,pvalue=0.3019167138216704)
# Power_divergenceResult(statistic=11.170065854104491,pvalue=0.13137094282775724)

在这种情况下我们无法拒绝H0，我们相信采样数据可能来自调整后的瑞利分布。

curve-fitting histogram numpy python scipy

蟒蛇：瑞利飞度直方图 更新

问题描述

解决方法

调整MCVE

注释

更新

相关问答

蟒蛇：瑞利飞度直方图更新