问题描述
1 Create a column Usage_Per_Year from Miles_Driven_Per_Year by discretizing the values into three equally sized categories. The names of the categories should be Low,Medium,and High.
2 Group by Usage_Per_Year and print the group sizes as well as the ranges of each.
3 Do the same as in #1,but instead of equally sized categories,create categories that have the same number of points per category.
4 Group by Usage_Per_Year and print the group sizes as well as the ranges of each.
我的代码如下
df["Usage_Per_Year "],bins = pd.cut(df["Miles_Driven_Per_Year"],3,precision=2,retbins=True)
group_label = pd.Series(["Low","Medium","High"])
#3.3.2
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))
#3.3.3
Year2 = pd.cut(df["Miles_Driven_Per_Year"],precision=2)
group_label = pd.Series(["Low","High"])
#3.3.4
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))
输出如下:
Usage_Per_Year 0 Low (-1925.883,663476.235] 6018 Medium (663476.235,1326888.118] 0 High (1326888.118,1990300.0] 1
Usage_Per_Year 0 Low (-1925.883,1990300.0] 1
但是 -1925 是错误的...
我该怎么办...
解决方法
也许第 1 行有错别字:df["Usage_Per_Year "]
?列名末尾有一个空格。
pd.cut
将值分成相等的大小。这就是为什么您的所有垃圾箱都具有相同大小的原因。看来您应该在分箱后计算每个组的最小值和最大值。
此外,要将值归入相等的频率,您应该使用 pd.qcut
。
示例输入:
import numpy as np
import pandas as pd
rng = np.random.default_rng(20210514)
df = pd.DataFrame({
'Miles_Driven_Per_Year': rng.gamma(1.05,10000,(1000,)).astype(int)
})
# 1
group_label = ['Low','Medium','High']
df['Usage_Per_Year'] = pd.cut(df['Miles_Driven_Per_Year'],bins=3,labels=group_label)
# 2
print(df.groupby('Usage_Per_Year').agg(['count','min','max']))
# 3
df['Usage_Per_Year'] = pd.qcut(df['Miles_Driven_Per_Year'],q=3,labels=group_label)
# 4
print(df.groupby('Usage_Per_Year').agg(['count','max']))
示例输出:
Miles_Driven_Per_Year
count min max
Usage_Per_Year
Low 878 31 20905
Medium 107 20955 41196
High 15 41991 62668
Miles_Driven_Per_Year
count min max
Usage_Per_Year
Low 334 31 4378
Medium 333 4449 11424
High 333 11442 62668