根据某些条件将行分隔到时间段

问题描述

我有一个包含 100k+ 行的 DataFrame,我需要遍历它并根据它所在的时隙进行计数。一个 DataFrame 示例如下:

   Call Sign    Entry_Time                 Exit_Time        Sector
0   EA213    2020-10-01 22:24:00      2020-10-01 22:50:55   north
1   NGF23    2020-10-01 22:32:00      2020-10-01 22:53:00   West
2   USR24    2020-10-01 22:44:00      2020-10-01 23:01:53   Central
3   EF36D    2020-10-01 22:50:55      2020-10-01 23:04:07   north
4   NGF23    2020-10-01 22:53:00      2020-10-01 23:03:54   north
5   USR24    2020-10-01 23:01:53      2020-10-01 23:13:44   West
6   EF36D    2020-10-01 23:04:07      2020-10-01 23:26:48   Central
7   USR24    2020-10-01 23:13:44      2020-10-01 23:28:00   Central
8   OSA26    2020-10-02 15:02:00      2020-10-02 15:09:31   West
9   OSA26    2020-10-02 15:09:31      2020-10-02 15:25:47   north

如果进入和退出时间在开始和结束时间段内,我需要计算每一行。为此,我使用以下代码

startDay   = 1
startMonth = 10
startYear  = 2020
endDay     = 5
endMonth   = 10
endYear    = 2020
interval  = 30
startDate = str(datetime(startYear,startMonth,startDay).date())
endDate   = str(datetime(endYear,endMonth,endDay).date())
    
timeInterval=pd.DataFrame()
sectors = ['West','north','Central']
endDateMinus1 = str(datetime(endYear,endDay)-timedelta(seconds=1))
timeInterval['Start']=pd.date_range(start=startDate+' 00:00:00',end=endDateMinus1,freq=str(interval)+'T')
timeInterval['End']= pd.date_range(start=startDate+' 00:'+str(interval)+':00',end=endDate+' 00:00:00',freq=str(interval)+'T')

for index,row in timeInterval.iterrows():
    startMask  = (df['Entry_Time'] >= row.Start) | (df['Exit_Time'] >= row.Start)
    endMask    = (df['Entry_Time'] <  row.End) | (df['Exit_Time'] <  row.End)
    timeInterval.loc[index,'Total Count'] = df[startMask & endMask].count()['Call Sign']
    
    for sector in sectors:
        filteredDF = df[startMask & endMask & (df['Sector']==sector)]
        filteredDF[sector+' Time']=0
        
        filter1 = (filteredDF['Entry_Time']<row.Start) & (filteredDF['Exit_Time']<=row.End)
        filter2 = (filteredDF['Entry_Time']<row.Start) & (filteredDF['Exit_Time']>row.End)
        filter3 = (filteredDF['Entry_Time']>=row.Start) & (filteredDF['Exit_Time']<=row.End)
        filter4 = (filteredDF['Entry_Time']>=row.Start) & (filteredDF['Exit_Time']>row.End)

        filteredDF.loc[filter1,sector+' Time'] = (filteredDF.loc[filter1,'Exit_Time']-row.Start).dt.seconds/60
        filteredDF.loc[filter2,sector+' Time'] = interval
        filteredDF.loc[filter3,sector+' Time'] = (filteredDF.loc[filter3,'Exit_Time']-filteredDF.loc[filter3,'Entry_Time']).dt.seconds/60
        filteredDF.loc[filter4,sector+' Time'] = (row.End-filteredDF.loc[filter4,'Entry_Time']).dt.seconds/60

        timeInterval.loc[index,sector+' Total Count'] = filteredDF.count()['Call Sign']
        timeInterval.loc[index,sector+' Total Time (min)'] = float("{:.2f}".format(filteredDF[sector+' Time'].sum()))
        
        timeInterval.loc[index,sector+' Average Time (min)'] = 0 if timeInterval.loc[index,sector+' Total Count']==0 else timeInterval.loc[index,sector+' Total Time (min)']/timeInterval.loc[index,sector+' Total Count']

结果将是这样的:

Sector Analysis Result

根据共享的数据帧仔细查看某些行不为零的地方。

closer look

问题是随着间隔时间的增加数量或行数的增加,程序需要很长时间才能完成。我需要以不同的方式替换 for 循环,但不太确定该怎么做。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)