有没有办法创建表示包含两个日期时间之间的增量的各个月的新列?输出可能是每个新月度列的二进制值.我在想这样的事情(不起作用):
for i in [1, 2, 3, 4, 5]:
i_name = str(i)
values = example['end'] - example['start'] #Example line - need to expose values here)
example[i_name] = values
离开这个:
end name start
0 28/02/2012 joe bloggs 01/01/2012
1 15/03/2012 jane bloggs 01/02/2012
2 17/05/2012 jim bloggs 01/04/2012
3 18/04/2012 john bloggs 01/02/2012
对此:
end 1 2 3 4 5 name start
0 28/02/2012 1 1 0 0 0 joe bloggs 01/01/2012
1 15/03/2012 0 1 1 0 0 jane bloggs 01/02/2012
2 17/05/2012 0 0 0 1 1 jim bloggs 01/04/2012
3 18/04/2012 0 1 1 1 0 john bloggs 01/02/2012
解决方法:
我认为你可以使用get_dummies
主要get_dummies
:
#convert columns to datetime
df['end'] = pd.to_datetime(df.end, dayfirst=True)
df['start'] = pd.to_datetime(df.start, dayfirst=True)
#print df
#get months to Series
end = df['end'].dt.month
start = df['start'].dt.month
#create difference DataFrame
df1 = pd.DataFrame({'end':end, 'start':start})
.apply(lambda x: pd.Series(range(x.start, x.end + 1)), axis=1)
print df1
0 1 2
0 1.0 2.0 NaN
1 2.0 3.0 NaN
2 4.0 5.0 NaN
3 2.0 3.0 4.0
#create indicator variables, sum values by index
df1 = pd.get_dummies(df1.stack().reset_index(level=1, drop=True))
.groupby(level=0).sum().astype(int)
#convert float columns names to int
df1.columns = df1.columns.to_series().astype(int)
print df1
1 2 3 4 5
0 1 1 0 0 0
1 0 1 1 0 0
2 0 0 0 1 1
3 0 1 1 1 0
#append to original DataFrame
print pd.concat([df, df1], axis=1)
end name start 1 2 3 4 5
0 2012-02-28 joe bloggs 2012-01-01 1 1 0 0 0
1 2012-03-15 jane bloggs 2012-02-01 0 1 1 0 0
2 2012-05-17 jim bloggs 2012-04-01 0 0 0 1 1
3 2012-04-18 john bloggs 2012-02-01 0 1 1 1 0