数据集的Scipy统计信息KS-test分布与直方图之间不匹配

问题描述

我有一个像这样的数据集

y = array([ 25.,20.,10.,31.,30.,66.,13.,5.,9.,2.,4.,6.,26.,72.,7.,18.,8.,12.,114.,17.,39.,42.,63.,3.,16.,27.,48.,24.,21.,106.,120.,34.,52.,22.,35.,1.,56.,195.,60.,77.,59.,67.,46.,40.,53.,1.])

以下是该数据对应的直方图

number_of_bins = len(y)
bin_cutoffs = np.linspace(np.percentile(y,0),np.percentile(y,99),number_of_bins)
h = plt.hist(y,bins = bin_cutoffs,color='red')

我使用以下代码测试数据集以从scipy stat KS测试中获取实际参数（从How to find probability distribution and parameters for real data? (Python 3)获得了此参数）

def get_best_distribution(data):
dist_names = ["norm","exponweib","weibull_max","weibull_min","expon","pareto","genextreme","gamma","beta"]
dist_results = []
params = {}
for dist_name in dist_names:
    dist = getattr(st,dist_name)
    param = dist.fit(data)

    params[dist_name] = param
    # Applying the Kolmogorov-Smirnov test
    D,p = st.kstest(data,dist_name,args=param)
    print("p value for "+dist_name+" = "+str(p))
    dist_results.append((dist_name,p))

# select the best fitted distribution
best_dist,best_p = (max(dist_results,key=lambda item: item[1]))
# store the name of the best fit and its p value

print("Best fitting distribution: "+str(best_dist))
print("Best p value: "+ str(best_p))
print("Parameters for the best fit: "+ str(params[best_dist]))
return best_dist,best_p,params[best_dist]

结果表明其极度分布。结果如下所示：

('genextreme',0.1823402997669471,(-1.119997717132149,5.036499415233003,6.2122664378291175))

使用这些属性的拟合曲线如下

根据我的理解，直方图表明它是指数分布，但是根据KS测试，它显示出另一种分布。有人可以解释为什么会这样或发生什么错误吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）