问题描述
School Grade Class
Date
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
我想建立学校之间同一日期的比率,并将其添加到同一数据框中,如下所示:
日期:2019年1月1日比率:学校A年级/学校B年级= 2/4 = 0.5等
Date Type Value Class
2019-01-01 School A 2 Math
2019-02-01 School A 3 Math
2019-06-01 School A 1 Math
2019-01-01 School B 4 Math
2019-02-01 School B 5 Math
2019-06-01 School B 2 Math
2019-01-01 School C 6 Math
2019-02-01 School C 5 Math
2019-06-01 School C 6 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
代码如下:
import pandas as pd
Input = {'Date': ['2019-01-01','2019-02-01','2019-06-01','2019-01-01','2019-06-01'],'School': ['School A','School A','School B','School B'],'Grade': [2,3,1,4,5,2],'Class': ['Math','Math','Math']
}
df = pd.DataFrame(Input,columns = ['Date','School','Grade','Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
我不确定如何遍历行(是否需要)并根据条件划分专用数字。
解决方法
以下方法应该起作用:
df2=df[df['School']=='School A']
df2['School']='School A/School B'
df2['Grade']=df2['Grade']/df[df['School']=='School B']['Grade']
result=pd.concat([df,df2])
print(result)
输出:
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A/School B 0.5 Math
2019-02-01 School A/School B 0.6 Math
2019-06-01 School A/School B 0.5 Math
,
尝试使用groupby
来避免日期未排序的问题。
from operator import truediv
from functools import reduce
schools = ["School A","School B"]
df1 = df.loc[df.School.isin(schools)]
grades = pd.DataFrame(df1.groupby(df1.index)["Grade"].agg(lambda s: reduce(truediv,s)))
grades["School"] = "School A / School B"
grades["Class"] = "Math"
pd.concat([df1,grades])
School Grade Class
Date
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math
,
我的解决方案使用数据帧的深层副本并选择数据。然后可以将两个df相除。
import pandas as pd
Input = {'Date': ['2018-01-01','2018-02-01','2019-01-01','2019-02-01','2019-06-01','2019-06-01'],'School': ['School A','School A','School B','School C','School C'],'Grade': [1,6,2,3,1,4,5,6],'Class': ['Math','Math','Math']
}
df = pd.DataFrame(Input,columns = ['Date','School','Grade','Class'])
df['Date'] = pd.to_datetime(df.Date)
df = df.set_index('Date')
df_copy_A = df.copy(deep=True)
df_copy_B = df.copy(deep=True)
df_copy_A = df_copy_A[(df_copy_A['School'] == 'School A')]
df_copy_B = df_copy_B[(df_copy_B['School'] == 'School B')]
df_copy_B['School'] = 'School A / School B'
df_copy_B['Grade'] = df_copy_B['Grade'].rdiv(df_copy_A['Grade'])
df = pd.concat([df,df_copy_B])
print(df)
哪个产生了预期的输出:
School Grade Class
Date
2018-01-01 School A 1.0 Math
2018-02-01 School A 6.0 Math
2019-01-01 School A 2.0 Math
2019-02-01 School A 3.0 Math
2019-06-01 School A 1.0 Math
2019-01-01 School B 4.0 Math
2019-02-01 School B 5.0 Math
2019-06-01 School B 2.0 Math
2019-01-01 School C 6.0 Math
2019-02-01 School C 5.0 Math
2019-06-01 School C 6.0 Math
2019-01-01 School A / School B 0.5 Math
2019-02-01 School A / School B 0.6 Math
2019-06-01 School A / School B 0.5 Math