SQL-如何在特定行之前选择x行数

问题描述

我有这张桌子:

   ts  |  user_id  |   event   |  
-------------------------------
 1500        a         eat 
 1501        a         walk 
 1502        a         sleep 
 1500        b         eat 
 1501        b         sleep 
 1502        b         wake
 1500        c         walk 
 1501        c         eat
 1502        c         sit
 1503        c         sleep 
 1504        c         wake 

因此,我想选择某个事件之前的x行数,假设我想为每个user_id选择sleep之前的2个事件。

我的决赛桌结果应该像这样:

user_id  |   event   |   rank  |
--------------------------------
    a         eat         1
    a         walk        2
    a         sleep       3
    b         NULL        0
    b         eat         1
    b         sleep       2
    c         eat         2
    c         sit         3
    c         sleep       4

如何在SQL(特别是Redshift SQl)中执行此操作

解决方法

这是一个缺岛问题,您需要每个岛的第一行和最后两行。

最安全的方法可能是休眠事件的窗口总和以定义组,然后使用row_number()进行过滤:

select *
from (
    select t.*,row_number() over(partition by user_id,grp order by ts) rn_asc,grp order by ts desc) rn_desc
    from (
        select t.*,sum(case when event = 'sleep' then 1 else 0 end) 
                over(partition by user_id order by ts desc)  grp
        from mytable t
    ) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id,ts

我们定义的岛屿中,“睡眠”事件的窗口计数按降序排列。然后,我们只按升序和降序枚举每个岛行,并根据我们感兴趣的记录进行过滤。

Demo on DB Fiddle

  ts | user_id | event | grp | rn_asc | rn_desc
---: | :------ | :---- | --: | -----: | ------:
1500 | a       | eat   |   1 |      1 |       3
1501 | a       | walk  |   1 |      2 |       2
1502 | a       | sleep |   1 |      3 |       1
1500 | b       | eat   |   1 |      1 |       2
1501 | b       | sleep |   1 |      2 |       1
1500 | c       | walk  |   1 |      1 |       4
1502 | c       | sit   |   1 |      3 |       2
1503 | c       | sleep |   1 |      4 |       1

编辑

Redshift在窗口函数的order by子句中需要一个窗口框架。因此,键入时间会更长一些:

select *
from (
    select t.*,row_number() over(
            partition by user_id,grp 
            order by ts rows between unbounded preceding and current row
        ) rn_asc,grp 
            order by ts rows between unbounded preceding and current row
        ) rn_desc
    from (
        select t.*,sum(case when event = 'sleep' then 1 else 0 end) over(
                partition by user_id 
                order by ts desc
                order by ts rows between unbounded preceding and current row
            )  grp
        from mytable t
    ) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id,ts
,

嗯。 。 。您可以使用lead()

select t.*
from (select t.*,lead(event) over (partition by user_id order by ts) as next_event,lead(event,2) over (partition by user_id order by ts) as next_event2
      from t
     ) t
where 'sleep' in (event,next_event,next_event2);

注意:这仅返回数据中的行。如果需要生成行,则需要其他逻辑。

编辑:

您实际上可以对此进行概括:

select t.*
from (select t.*,sum(case when event = 'sleep') over (partition by user_id order by ts rows between current row and 2 following) as cnt_sleep
      from t
     ) t
where cnt_sleep > 0;

这将计算接下来的n行中的“睡眠”次数(n-1)。如果在其中任何一个中找到“睡眠”,它将返回一行。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...