问题描述
我们如何拆分数据,如下所述?
样本数据:
EmployeeId City join_month Year
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
必需的输出应该是 -
EmployeeId City join_month Year 2018_jan_count 2018_feb_count 2018_march_count
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
解决方法
您可以使用df.apply
df.apply(pd.value_counts)
这将应用基于列的聚合函数(在本例中为日期)
,我构建了 yearmonth 值,然后将其用作数据透视表上的一列。我统计了按城市和年月聚合的员工id
months=[(1,'Jan'),(2,'Feb'),(3,'Mar'),(4,'Apr'),(5,'May'),(6,'Jun'),(7,'Jul'),(8,'Aug'),(9,'Sept'),(10,'Oct'),(11,'Nov'),(12,'Dec')]
employeeId=['001','001','002','003','004','005']
city=['Mumbai','Bangalore','Pune','Mumbai','Delhi','Mumbai']
join_month=[1,3,2,6,9,12,11,10,1]
char_month=[b for item in join_month for a,b in months if item==a ]
year=[2018,2018,2019,2017,2018]
char_yearmonth=[]
[char_yearmonth.append(str(year[i])+"_"+char_month[i]) for i in range(len(year))]
df=pd.DataFrame({'EmployeeId': employeeId,'City':city,'YearMonth':char_yearmonth})
fp=df.pivot_table(index=['City'],columns=['YearMonth'],aggfunc='count').fillna(0)
print(fp)
EmployeeId \
YearMonth 2017_Dec 2017_Jun 2017_Nov 2018_Jan 2018_Mar 2018_Oct 2018_Sept
City
Bangalore 0.0 0.0 1.0 0.0 1.0 0.0 0.0
Delhi 0.0 0.0 0.0 0.0 0.0 0.0 1.0
Mumbai 1.0 1.0 0.0 2.0 0.0 0.0 0.0
Pune 0.0 0.0 0.0 0.0 0.0 1.0 0.0
YearMonth 2019_Feb
City
Bangalore 0.0
Delhi 0.0
Mumbai 0.0
Pune 1.0