有没有办法在 python 中创建 bins 而不是列出所有 bin 编号如下面的代码所示，也许不必使用 np.digitize？

问题描述

在我的代码中，我创建了 10 个垃圾箱（下面列出了特定垃圾箱范围）：

4100000-4155304
4155304-4210608
4210608-4321216
4321216-4542432
4542432-4984865
4984865-5327533
5327533-5670201
5670201-5746217
5746217-5873109

5873109-6000000

bins = [4100000,4155304,4210608,4321216,4542432,4984865,5327533,5670201,5746217,5873109,6000000]
bin_indices = np.digitize(bins_array,bins)

有没有一种方法可以做到这一点，而不必列出所有的 bin 编号（bins = [bin numbers]），也许也不必使用 np.digitize？非常感谢！

解决方法

只需使用 numpy.arange 方法：

bins = np.arange(4100000,6000000,55304)
bins

输出

array([4100000,4155304,4210608,4265912,4321216,4376520,4431824,4487128,4542432,4597736,4653040,4708344,4763648,4818952,4874256,4929560,4984864,5040168,5095472,5150776,5206080,5261384,5316688,5371992,5427296,5482600,5537904,5593208,5648512,5703816,5759120,5814424,5869728,5925032,5980336])

干杯

我找不到另一篇 SO 帖子的原作者，我从使用 Pandas 得到了这个，但也许可以尝试下面这样的东西，我非常快速地通过一个想法来尝试。数据框只是 numpy 随机范围，用于在您正在寻找的范围内生成假数据。

import pandas as pd
import numpy as np

#create bins & categories for data ranges
cats = ['4100000_4155303','4155304_4210608','4210608_4321215','4321216_4542431','4542432_4984864','4984865_5327532','5327533_5670200','5670201_5746216','5746217_5873108','5873109_6000000']

bins = [0,4100000,4321215,4542431,5327532,5670200,5746216,5873108,6000000]


def binn(df):
    df = (df.groupby([df.index,pd.cut(df['A'],bins,labels=cats)])
                .size()
                .unstack(fill_value=0)
                .reindex(columns=cats,fill_value=0))
    return df


rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(4155304,size=(1000,1)),columns=list('A'))

dfBinned = binn(df)

print('All data binned in column A of the df')
print(dfBinned.sum(axis = 0))

打印：

All data binned in column A of the df
A
4100000_4155303      0
4155304_4210608     35
4210608_4321215     42
4321216_4542431    130
4542432_4984864    239
4984865_5327532    174
5327533_5670200    205
5670201_5746216     37
5746217_5873108     63
5873109_6000000     75
dtype: int64

arrays arrays arrays bins digitization numpy python