滞后函数和SUM

问题描述

我需要获取每天离线至少20分钟的用户列表。这是我的数据

device_data

我有这个开始的查询,但是在如何对offline_mins中的差异求和时陷入困境,即需要在where子句中添加“和sum(offline_mins)> = 20”

SELECT  
   userid,connected,LAG(recordeddt) OVER(PARTITION BY userid
   ORDER BY userid,recordeddt) AS offline_period,DATEDIFF(minute,recordeddt),recordeddt)  offline_mins
FROM device_data where connected=0; 

我的预期结果:

Expected results

谢谢。

解决方法

这听起来像是一个空白问题,您希望将具有相同用户名和状态的相邻行组合在一起。

首先,这是一个计算孤岛的查询:

select userid,connected,min(recordeddt) startdt,max(lead_recordeddt) enddt,datediff(min(recordeddt),max(lead_recordeddt)) duration
from (
    select dd.*,row_number()     over(partition by userid order by recordeddt) rn1,row_number()     over(partition by userid,connected order by recordeddt) rn2,lead(recordeddt) over(partition by userid order by recordeddt) lead_recordeddt
    from device_data dd
) dd
group by userid,rn1 - rn2

现在,假设您希望每天 至少离线20分钟的用户。您可以每天细分岛屿,并使用having子句进行过滤:

select userid
from (
    select recordedday,userid,max(lead_recordeddt)) duration
    from (
        select dd.*,v.*,row_number()     over(partition by v.recordedday,userid order by recordeddt) rn1,lead(recordeddt) over(partition by v.recordedday,userid order by recordeddt) lead_recordeddt
        from device_data dd
        cross apply (values (convert(date,recordeddt))) v(recordedday)
    ) dd
    group by convert(date,recordeddt),rn1 - rn2
) dd
group by userid
having count(distinct case when connected = 0 and duration >= 20 then recordedday end) = count(distinct recordedday)
,

如上所述,这是一个差距和孤岛的问题。这是我的想法,使用简单的滞后函数创建组,过滤出连接的行,然后处理日期范围。

CREATE TABLE #tmp(ID int,UserID int,dt datetime,connected int)
INSERT INTO #tmp VALUES
(1,1,'11/2/20 10:00:00',1),(2,'11/2/20 10:05:00',0),(3,'11/2/20 10:10:00',(4,'11/2/20 10:15:00',(5,'11/2/20 10:20:00',(6,2,(7,(8,(9,(10,(11,'11/2/20 10:25:00',(12,'11/2/20 10:30:00',0)


SELECT UserID,DATEDIFF(minute,MIN(DT),MAX(DT)) OFFLINE_MINUTES 
FROM
(
    SELECT *,SUM(CASE WHEN connected <> LG THEN 1 ELSE 0 END) OVER (ORDER BY UserID,dt) grp
    FROM
    (
        select *,LAG(connected,connected) OVER(PARTITION BY UserID ORDER BY UserID,dt) LG
        from #tmp
    ) x
) y
WHERE connected <> 1
GROUP BY UserID,grp,connected
HAVING DATEDIFF(minute,MAX(DT)) >= 20