使用GroupBy按时数据对数据框进行重新采样

问题描述

每秒的交通数据显示进出的汽车数量。我想按In / Out将它们汇总到2分钟内，并显示其总数，例如：

import pandas as pd

data = {'time': ["13:34:16","13:34:19","13:34:52","13:34:55","13:34:58","13:35:01","13:35:04","13:35:37","13:35:40","13:35:43","13:36:37","13:36:39","13:36:43","13:36:46","13:36:49","13:36:52","13:36:58","13:37:04","13:37:07","13:37:13","13:37:46","13:37:49","13:37:58",],'cars' : [15,22,12,1,331,32,14,5,51,13,3,2,4,89,105,63,'flow': ["In","Out","In","UnkNown",]}

我尝试过：

df = pd.DataFrame(data)
df.time = '2020-01-23 ' + df.time     # data date

df.time = pd.to_datetime(df.time,unit='s')

print (df.groupby('flow').resample('2T')['cars'].sum())

但是它给出了错误：

ValueError: non convertible value 2020-01-23 13:34:16 with the unit 's'

任何帮助请问正确的方法是什么？谢谢。

解决方法

我相信您应该对索引进行重新采样。你可以尝试：

df.time = pd.to_datetime(df.time)
df.set_index("time").groupby('flow').resample('2T')['cars'].sum()
flow     time               
In       2020-01-23 13:34:00     59
         2020-01-23 13:36:00    298
Out      2020-01-23 13:34:00    431
         2020-01-23 13:36:00      1
Unknown  2020-01-23 13:34:00      6
         2020-01-23 13:36:00      5
Name: cars,dtype: int64

如果您想复制自己的excel，则进一步：

df_new = df_new.unstack().T
df_new["Total"] =df_new.sum(axis=1)
print(df_new)
flow                  In  Out  Unknown  Total
time                                         
2020-01-23 13:34:00   59  431        6    496
2020-01-23 13:36:00  298    1        5    304

dataframe group-by pandas python resampling