问题描述
在我的代码中,我创建了 10 个垃圾箱(下面列出了特定垃圾箱范围):
-
4100000-4155304
-
4155304-4210608
-
4210608-4321216
-
4321216-4542432
-
4542432-4984865
-
4984865-5327533
-
5327533-5670201
-
5670201-5746217
-
5746217-5873109
-
5873109-6000000
bins = [4100000,4155304,4210608,4321216,4542432,4984865,5327533,5670201,5746217,5873109,6000000] bin_indices = np.digitize(bins_array,bins)
有没有一种方法可以做到这一点,而不必列出所有的 bin 编号(bins = [bin numbers]),也许也不必使用 np.digitize? 非常感谢!
解决方法
只需使用 numpy.arange
方法:
bins = np.arange(4100000,6000000,55304)
bins
输出
array([4100000,4155304,4210608,4265912,4321216,4376520,4431824,4487128,4542432,4597736,4653040,4708344,4763648,4818952,4874256,4929560,4984864,5040168,5095472,5150776,5206080,5261384,5316688,5371992,5427296,5482600,5537904,5593208,5648512,5703816,5759120,5814424,5869728,5925032,5980336])
干杯
,我找不到另一篇 SO 帖子的原作者,我从使用 Pandas 得到了这个,但也许可以尝试下面这样的东西,我非常快速地通过一个想法来尝试。数据框只是 numpy 随机范围,用于在您正在寻找的范围内生成假数据。
import pandas as pd
import numpy as np
#create bins & categories for data ranges
cats = ['4100000_4155303','4155304_4210608','4210608_4321215','4321216_4542431','4542432_4984864','4984865_5327532','5327533_5670200','5670201_5746216','5746217_5873108','5873109_6000000']
bins = [0,4100000,4321215,4542431,5327532,5670200,5746216,5873108,6000000]
def binn(df):
df = (df.groupby([df.index,pd.cut(df['A'],bins,labels=cats)])
.size()
.unstack(fill_value=0)
.reindex(columns=cats,fill_value=0))
return df
rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(4155304,size=(1000,1)),columns=list('A'))
dfBinned = binn(df)
print('All data binned in column A of the df')
print(dfBinned.sum(axis = 0))
打印:
All data binned in column A of the df
A
4100000_4155303 0
4155304_4210608 35
4210608_4321215 42
4321216_4542431 130
4542432_4984864 239
4984865_5327532 174
5327533_5670200 205
5670201_5746216 37
5746217_5873108 63
5873109_6000000 75
dtype: int64