频率计数的差异:Stats.relfreq 与 Seaborn

问题描述

我正在使用 Seaborn 绘制相对频率直方图。因为我还没有找到一种方法来保存与最高峰相关的值,所以我使用 stats.relfreq 来做到这一点。然而相对频率似乎并不匹配。

我在 Jupyter Notebook 中使用 Python。

我的数据:

my_data = [0.9995,0.9995,-0.0803,-0.7736,0.9418,0.3612,0.5023,0.9686,0.5574,0.8629,0.5226,0.9947,-0.8391,-0.4767,0.4215,0.8176,0.5106,-0.0772,0.0865,-0.6739,-0.5574,-0.6776,0.4588,-0.2263,0.8224,0.3804,-0.0516,-0.3818,0.0325,0.6341,0.0516,-0.5859,-0.5106,-0.0258,0.128,0.8126,-0.4201,-0.2449,-0.4215,-0.3506,-0.872,0.7506,-0.5719,0.7003,-0.235,0.1747,0.5994,0.5423,-0.25,0.8834,0.1761,-0.7691,0.6249,0.7819,-0.34700000000000003,-0.6486,0.2955,0.6486,0.1734,-0.2732,-0.6049,-0.8622,0.4404,0.25,0.5519,0.5583,-0.1027,0.4939,-0.2144,0.2247,0.9079,-0.7273,-0.4329,0.2263,-0.5423,-0.7362,0.34,-0.6115,-0.5994,-0.6697,0.9201,0.1027,0.5922,0.3822,0.5667,0.8316,0.9679,0.29600000000000004,0.3169,-0.9413,0.6478,0.29600000000000004]

我的代码

from scipy import stats
import seaborn as sns

# Calculate relative frequency of values,using 10 bins.
res = stats.relfreq(points,numbins = 10)
relative_frequency = res.frequency
print(relative_frequency)

#find highest value and corresponding index
highest_val = np.max(relative_frequency)
highest_index = np.where(relative_frequency == highest_val)
highest_index = int(highest_index[0])
print(highest_index)

# Ordered list with possible scores associated to each frequency bin
possible_scores = [-0.9,-0.7,-0.5,-0.3,-0.1,0.1,0.3,0.5,0.7,0.9]
averaged_relative_frequency_score = possible_scores[highest_index]
print(averaged_relative_frequency_score)

# Plot histogram with Seaborn
ax = sns.histplot(data = date_result['score'],stat = 'probability',bins = 10,binwidth = 0.2,binrange = [-1,1])

plt.xlim(-1,1)
plt.show()

以下是我得到的不同输出

print(relative_frequency)
#relative_frequency [0.0610687  0.06870229 0.09923664 0.07633588 0.04580153 0.08396947
 0.16793893 0.17557252 0.07633588 0.14503817]

print(highest_index)
# highest index = 7

print(averaged_relative_frequency_score)
# averaged_relative_frequency_score = 0.5

还有 Seaborn 情节:

Hisogram

如您所知,如果一切正常,Seaborn 图中的相应指数在使用 stats 模块计算的频率中将为 9。与 Seaborn 相比,stats.relfreq 中 bin 的大小是否不同?

我是否误解了任何明显的内容?我似乎无法理解为什么这两种方法会得到不同的峰值。

呸!

解决方法

刚写完这篇我就知道哪里出了问题。

stats.relfreq 中的 bin 默认有点 oversized

要获得相同的结果,您必须使用 defaultreallimits 参数指定直方图的范围。

在代码中实现:

res = stats.relfreq(points,numbins = 10,defaultreallimits = [-1,1])