问题描述
我一直在尝试解决此问题,但是到目前为止还没有解决。我正在使用Oracle。
我有一组看起来像这样的数据:
| USER | ACTIVITY | START_TIME | END_TIME | DURATION |
|--------|------------|-----------------|-----------------|----------|
| jsmith | Front Desk | 2020-08-24 8:00 | 2020-08-24 9:30 | 90 |
| jsmith | Phones | 2020-08-24 8:15 | 2020-08-24 8:45 | 30 |
| jsmith | Phones | 2020-08-24 9:45 | 2020-08-24 9:50 | 5 |
| bjones | Phones | 2020-08-24 9:00 | 2020-08-24 9:10 | 10 |
| bjones | Front Desk | 2020-08-24 9:05 | 2020-08-24 9:15 | 10 |
| bjones | Phones | 2020-08-24 9:15 | 2020-08-24 9:45 | 30 |
上面的输出可以通过以下查询生成:
SELECT
USER,ACTIVITY,START_TIME,END_TIME,DURATION
FROM USER_ACTIVITIES
WHERE USER IN ('jsmith','bjones')
AND START_TIME BETWEEN '2020-08-24 00:00:00' AND '2020-08-25 00:00:00'
ORDER BY USER,END_TIME
;
我需要计算每个用户的总“忙”时间,同时要考虑到某些活动相互重叠。使用现有查询,对于jsmith,每个用户的总持续时间为125,对于jjones,为50,但是由于某些活动重叠,因此这并不反映用户忙碌的总时间。
我正在寻找的输出是用户每天的总忙碌时间:
| USER | DATE | DURATION |
|--------|------------|----------|
| jsmith | 2020-08-24 | 95 |
| bjones | 2020-08-24 | 45 |
任何帮助,将不胜感激。
解决方法
您可以先取消分钟的设置,然后通过使用NOT EXISTS
来免除非重叠间隔(由于这种情况,我没有考虑日期间隔,您可以添加{{1} }(如果其他计算情况需要)
EXTRACT( hour FROM max_end_time - min_start_time )*3600
,
我将使用空缺和孤岛技术而不是递归来解决这个问题:
select usr,sum(duration) * 24 * 60 duration
from (
select usr,max(end_time) - min(start_time) duration
from (
select
ua.*,sum(case when start_time <= lag_end_time then 0 else 1 end) over(partition by usr order by start_time) grp
from (
select
ua.*,lag(end_time) over(partition by usr order by start_time) lag_end_time
from user_activities ua
) ua
) ua
group by usr,grp
) ua
group by usr
这个想法是使用窗口总和来建立具有相同用户和重叠时间段的记录组。然后,您可以计算每个“岛”的结束和开始之间的差额,最后根据每个用户进行汇总。
,以下代码至少需要12c:
WITH user_activities( "user",activity,start_time,end_time ) AS
(
SELECT 'jsmith','Front Desk',timestamp'2020-08-24 08:00:00',timestamp'2020-08-24 09:30:00' FROM dual UNION ALL
SELECT 'jsmith','Phones',timestamp'2020-08-24 08:15:00',timestamp'2020-08-24 08:45:00' FROM dual UNION ALL
SELECT 'jsmith',timestamp'2020-08-24 09:45:00',timestamp'2020-08-24 09:50:00' FROM dual UNION ALL
SELECT 'bjones',timestamp'2020-08-24 09:00:00',timestamp'2020-08-24 09:10:00' FROM dual UNION ALL
SELECT 'bjones',timestamp'2020-08-24 09:05:00',timestamp'2020-08-24 09:15:00' FROM dual UNION ALL
SELECT 'bjones',timestamp'2020-08-24 09:15:00',timestamp'2020-08-24 09:45:00' FROM dual
)
select "user",sum(durations) as durations
from
(
select "user",extract(hour from (end_time - start_time)) * 60 + extract(minute from (end_time - start_time)) as durations
from user_activities
match_recognize
(
partition by "user"
order by start_time,end_time
measures first(start_time) start_time,max(end_time) as end_time
pattern (a* b)
define a as max(end_time) >= next(start_time)
)
)
group by "user";
如果您对match_recognize感兴趣,这应该可以解决您的问题
输出:
,许多可能的解决方案。这是另一种方法:使用CTE,首先使用LEAD函数计算干净的结束时间(如果跟随的开始时间早于结束时间,则取跟随的开始时间)。然后按用户求和并分组:
WITH sampledata (username,end_time)
AS
(
SELECT 'jsmith','2020-08-24 8:00','2020-08-24 9:30' FROM DUAL UNION ALL
SELECT 'jsmith','2020-08-24 8:15','2020-08-24 8:45' FROM DUAL UNION ALL
SELECT 'jsmith','2020-08-24 9:45','2020-08-24 9:50' FROM DUAL UNION ALL
SELECT 'bjones','2020-08-24 9:00','2020-08-24 9:10' FROM DUAL UNION ALL
SELECT 'bjones','2020-08-24 9:05','2020-08-24 9:15' FROM DUAL UNION ALL
SELECT 'bjones','2020-08-24 9:15','2020-08-24 9:45' FROM DUAL
),clean_sampledata (username,end_time)
AS
(
SELECT
username,TO_DATE(start_time,'YYYY-MM-DD HH24:MI'),TO_DATE(end_time,'YYYY-MM-DD HH24:MI')
FROM sampledata
),clear_overlapped (username,clean_end_time)
AS
(
SELECT
username,NVL(LEAST(LEAD(start_time) OVER (PARTITION BY username ORDER BY start_time),end_time),end_time)
FROM clean_sampledata
),cleaned_minutes_per_username (username,mins)
AS
(
SELECT
username,ROUND((clean_end_time - start_time) * 1440)
FROM clear_overlapped
)
SELECT
username,SUM(mins)
FROM cleaned_minutes_per_username
GROUP BY username ;
bjones 45
jsmith 50