提取时间戳在特定范围内的Python Pandas记录

问题描述

我有一个数据框 df ,其中一列存储处理时间(TimeStamp对象)。

示例数据框:

from datetime import datetime,date
import pandas as pd

ids = ['WO_EW-1_10AUR-15-0031_00','IW-12_0400-15-0012_00','E-8_10AUR-18-0037_00']
dates = [date(2015,9,14),date(2015,17),date(2018,8,16)]
datetimes = [datetime(2015,14,13,23,40),datetime(2015,17,6,7),datetime(2018,16,7,32,6)]
datalist = list(zip(ids,dates,datetimes))

df = pd.DataFrame(datalist,columns=['ID','ProcessDate','ProcessingTime'])

enter image description here

我要实现的是提取满足某个条件(或多个条件)的所有记录。在一种情况下,我要查找'ProcessingTime'属性具有小时值大于13:10的所有记录。 在上面的示例数据框中,在这种情况下,所需的输出将是第一条记录。

将这种条件应用于数据框记录的正确方法是什么?


P.S。 我尝试使用以下方法,但两者均无效:

df.loc[ (df['Processtime'].time().hour > 14) ]

这将引发“ AttributeError”,因为“系列”对象没有属性“时间”

df.loc[ (df['Processtime'] > datetime.time(14,0) ]

这将引发“ TypeError”,因为 dtype = datetime64 [ns]与时间之间的比较无效

解决方法

import pandas as pd
from datetime import date,datetime,time

ids = ['WO_EW-1_10AUR-15-0031_00','IW-12_0400-15-0012_00','E-8_10AUR-18-0037_00']
dates = [date(2015,9,14),date(2015,17),date(2018,8,16)]
datetimes = [datetime(2015,14,13,23,40),datetime(2015,17,6,7),datetime(2018,16,7,32,6)]
datalist = list(zip(ids,dates,datetimes))

df = pd.DataFrame(datalist,columns=['ID','ProcessDate','ProcessingTime'])

# display(df)
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07
2      E-8_10AUR-18-0037_00  2018-08-16 2018-08-16 07:32:06

# single condition
df[df.ProcessingTime.dt.hour > 7]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07

# multiple conditions
df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40

# an entire datetime
df[df.ProcessingTime < '2015-09-17 09:06:07']

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40

# using .time
df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07