如何使用 Pandas 为分组数据动态定义 bin？输出

问题描述

有了数据集，我必须以多种方式对其进行分组（使用 MultiIndex），执行一些聚合并导出结果。这些操作之一是对带有价格的列进行分箱（分桶）以获得每个分桶中的元素数量。我需要 3 个桶，其中：

在第一个桶中，我获取价格等于该分组的最低价格或等于或不高于最低价格 110% 的所有元素
在第二个桶中，我获取价格高于最低价格 110% 但等于或不高于最低价格 150% 的所有元素
在第三个桶中 - 其余的。

示例：

产品	国家	卖	col	价格	colb
第一次	DE	A	b	100	x
第二个	DE	A	g	105	z
第一次	FR	A	b	111	x
第二个	FR	A	g	100	z
第一次	DE	B	b	109	x
第二个	DE	B	g	120	z
第一次	FR	B	b	100	x
第二个	FR	B	g	200	z

我的期望：

产品	国家	卖	1x	1.1x	>1.5x
第一次	DE	A	1	0	0
		B	0	1	0
	FR	A	0	1	0
		B	1	0	0
第二个	DE	A	1	0	0
		B	0	1	0
	FR	A	1	0	0
		B	0	0	1

现在，我要做的是：

import numpy as np
import pandas as pd

# some code

df_low_price = df.groupby(["Product","Country","Sell"])["price"].sort_values(by="price").nth(0)
df_low_price_1_1x = df_low_price.map(lambda n: n * 1.1)
df_low_price_1_5x = df_low_price.map(lambda n: n * 1.5)

df_main = pd.concat([
df_low_price,df_low_price_1_1x,df_low_price_1_5x,axis=1
])

我得到了界限，但我没有得到组的大小。我知道我应该依赖 pd.cut 或 cut，但我不知道如何使用 pythonic/pandas 的方式。预先感谢您的任何建议。

解决方法

只是按照你的描述
到 bin cut
多索引和列 groupby/agg 和 char*

unstack()

输出

df = pd.DataFrame({'Product': ['First','Second','First','Second'],'Country': ['DE','DE','FR','FR'],'Sell': ['A','A','B','B'],'col': ['b','g','b','g'],'price': [100,105,111,100,109,120,200],'colb': ['x','z','x','z'],'bin': ['1x','1x','1.1x','>1.5x']})

df["bin"] = pd.cut(
    df["price"],bins=[
        df["price"].min() - 1,df["price"].min() * 1.1,df["price"].min() * 1.5,df["price"].max(),],labels=["1x","1.1x",">1.5x"],)

df.groupby(["Product","Country","Sell","bin"]).agg({"col":"count"}).unstack().droplevel(0,1)

binning group-by pandas pandas python

如何使用 Pandas 为分组数据动态定义 bin？ 输出

问题描述

解决方法

输出

如何使用 Pandas 为分组数据动态定义 bin？输出