我如何在Matplotlib hist中包含垃圾箱的上限

问题描述

使用Matplotlib中的hist()创建直方图时，数据会像这样落入bin中：

lb ≤ x < ub。如何强制其表现为：lb < x ≤ ub？

此外，与Excel相比，频率表被下移了1个档，这对我来说产生了不准确的测量值。使用该表作为参考，如何强制hist（）使28和30之间的值落在bin 30中，而不是在bin 25中？同样，值23.5在Python的bin 20中下降，因为bin 20 = 20≤x

data = np.array([23.5,28,29,29.5,30,30])
bins = np.array([20,25,30])
# Excel               1,8
# Python          1,5

解决方法

也许numpy.digitize（从文档中）对您来说很有趣：

Return the indices of the bins to which each value in input array belongs.

`right`    order of bins  returned index `i` satisfies
=========  =============  ============================
``False``  increasing     ``bins[i-1] <= x < bins[i]``
``True``   increasing     ``bins[i-1] < x <= bins[i]``
``False``  decreasing     ``bins[i-1] > x >= bins[i]``
``True``   decreasing     ``bins[i-1] >= x > bins[i]``

希望这也消除了使用垃圾箱时常见的误解。 bins对应于网格的顶点，并且数据点位于一个箱中的两个顶点之间。因此，数据点不对应于bins数组中的一个点，而是对应于两个点。从这种表示法可以看到的另一件事是，{1}的bin 1从20-25开始，而bin 2从25-30，也许excel中的表示法是不同的？

为自定义直方图函数使用关键字bins=[20,25,30]会导致以下代码和绘图。

right

请注意，在import numpy as np import matplotlib.pyplot as plt data = np.array([15,17,18,20,23.5,24,28,29,30,30]) bins = np.array([15,30]) def custom_hist(x,bins,right=False): x_dig = np.digitize(x,bins=bins,right=right) u,c = np.unique(x_dig,return_counts=True) h = np.zeros(len(bins),dtype=int) h[u] = c return h plt.hist(data,color='b',alpha=0.7,label='plt.hist') # array([3.,5.,7.] height = custom_hist(x=data,right=True) width = np.diff(bins) width = np.concatenate((width,width[-1:])) plt.bar(bins-width,height=height,width=width,align='edge',color='r',label='np.digitize') plt.legend() # This function also allows different sized bins right=True属于垃圾箱的情况下？15中。如果不希望这样做，则必须分别处理边缘情况，并可能将值添加到第一个有效箱中。我想这也是我们看到意外情况的原因您的示例数据的行为。 Matplotlib将bins应用于垃圾箱，但是30仍然与垃圾箱25-30关联。如果我们添加一个额外的垃圾箱30-35，我们可以看到现在第30个垃圾箱已放入此垃圾箱。我猜想他们将规则lb ≤ x < ub应用于边缘以外的所有地方，这里使用lb ≤ x < ub，这也是合理的，但必须意识到这一点。

lb ≤ x ≤ ub

bins histogram matplotlib python