问题描述
每秒的交通数据显示进出的汽车数量。我想按In / Out将它们汇总到2分钟内,并显示其总数,例如:
import pandas as pd
data = {'time': ["13:34:16","13:34:19","13:34:52","13:34:55","13:34:58","13:35:01","13:35:04","13:35:37","13:35:40","13:35:43","13:36:37","13:36:39","13:36:43","13:36:46","13:36:49","13:36:52","13:36:58","13:37:04","13:37:07","13:37:13","13:37:46","13:37:49","13:37:58",],'cars' : [15,22,12,1,331,32,14,5,51,13,3,2,4,89,105,63,'flow': ["In","Out","In","UnkNown",]}
我尝试过:
df = pd.DataFrame(data)
df.time = '2020-01-23 ' + df.time # data date
df.time = pd.to_datetime(df.time,unit='s')
print (df.groupby('flow').resample('2T')['cars'].sum())
但是它给出了错误:
ValueError: non convertible value 2020-01-23 13:34:16 with the unit 's'
任何帮助请问正确的方法是什么?谢谢。
解决方法
我相信您应该对索引进行重新采样。你可以尝试:
df.time = pd.to_datetime(df.time)
df.set_index("time").groupby('flow').resample('2T')['cars'].sum()
flow time
In 2020-01-23 13:34:00 59
2020-01-23 13:36:00 298
Out 2020-01-23 13:34:00 431
2020-01-23 13:36:00 1
Unknown 2020-01-23 13:34:00 6
2020-01-23 13:36:00 5
Name: cars,dtype: int64
如果您想复制自己的excel,则进一步:
df_new = df_new.unstack().T
df_new["Total"] =df_new.sum(axis=1)
print(df_new)
flow In Out Unknown Total
time
2020-01-23 13:34:00 59 431 6 496
2020-01-23 13:36:00 298 1 5 304