查找在SQL中按顺序a-> b-> c-> d发生事件的客户ID

问题描述

事件表:

customerid  eventname  eventtime
----------  ---------  ---------
         1  a          1:00:00
         1  b          1:05:00
         1  c          1:10:00
         1  d          1:15:00
         2  a          1:00:00
         2  c          1:10:00
         2  d          1:15:00
         2  f          1:20:00
         3  b          2:00:00
         3  d          2:20:00

sql中按顺序a-> b-> c-> d查找具有事件的客户ID 输出应为customerid 1

解决方法

希望在通用SQL中分别使用每个事件并定义它们之间的关系。

SELECT A.customerid
FROM Event A
JOIN Event B USING customerid
JOIN Event C USING customerid
JOIN Event D USING customerid
WHERE A.eventtime < B.eventtime AND A.eventname='a' AND
 B.eventtime < C.eventtime AND B.eventname='b' AND
 C.eventtime < D.eventtime AND C.eventname='c' AND
 D.eventname='d'
,

最简单的解决方案可能是字符串聚合。实际的语法可能会因数据库而异,但是想法是:

select customerid
from mytable
group by customerid
having string_agg(eventname,',' order by eventtime) = 'a,b,c,d'

这在Postgres中有效。在SQL Server中,您可以将having子句的短语设置为:

string_agg(eventname,') within group(order by eventtime) = 'a,d'

在Oracle中:

having listagg(eventname,d'

在MySQL中:

having group_concat(eventname order by eventtime) = 'a,d'
,

以更便携的方式,在时间序列上使用OLAP或类似功能。

只要有“ a”,“ b”,“ c”,“ d”工作,如果还有其他“ e”或“ z”行,也可以工作。

Vertica,例如:

WITH
input(customerid,eventname,eventtime) AS (
          SELECT 1,'a',TIME '1:00:00'
UNION ALL SELECT 1,'b',TIME '1:05:00'
UNION ALL SELECT 1,'c',TIME '1:10:00'
UNION ALL SELECT 1,'d',TIME '1:15:00'
UNION ALL SELECT 2,TIME '1:00:00'
UNION ALL SELECT 2,TIME '1:10:00'
UNION ALL SELECT 2,'f',TIME '1:20:00'
UNION ALL SELECT 3,TIME '2:00:00'
UNION ALL SELECT 3,TIME '2:20:00'
)
SELECT
  *,event_name(),pattern_id(),match_id()
FROM input
MATCH(
  PARTITION BY customerid
  ORDER BY eventtime
  DEFINE
    is_a AS eventname='a',is_b AS eventname='b',is_c AS eventname='c',is_d AS eventname='d'
  PATTERN p AS (is_a is_b is_c is_d)
);
-- out  customerid | eventname | eventtime | event_name | pattern_id | match_id 
-- out ------------+-----------+-----------+------------+------------+----------
-- out           1 | a         | 01:00:00  | is_a       |          1 |        1
-- out           1 | b         | 01:05:00  | is_b       |          1 |        2
-- out           1 | c         | 01:10:00  | is_c       |          1 |        3
-- out           1 | d         | 01:15:00  | is_d       |          1 |        4

然后,您可以按获得的pattern_id分组并计算持续时间或所需的任何时间。

任何支持DBMS的OLAP功能:

WITH input(customerid,TIME '2:20:00'
),neighbours AS (
  SELECT
    *,LEAD(eventname,1) OVER(PARTITION BY customerid ORDER BY eventtime) AS event2,2) OVER(PARTITION BY customerid ORDER BY eventtime) AS event3,3) OVER(PARTITION BY customerid ORDER BY eventtime) AS event4
  FROM input
)
SELECT
  *
FROM neighbours
WHERE eventname='a'
  AND event2   ='b'
  AND event3   ='c'
  AND event4   ='d'
;
-- out  customerid | eventname | eventtime | event2 | event3 | event4 
-- out ------------+-----------+-----------+--------+--------+--------
-- out           1 | a         | 01:00:00  | b      | c      | d