问题描述
事件表:
customerid eventname eventtime
---------- --------- ---------
1 a 1:00:00
1 b 1:05:00
1 c 1:10:00
1 d 1:15:00
2 a 1:00:00
2 c 1:10:00
2 d 1:15:00
2 f 1:20:00
3 b 2:00:00
3 d 2:20:00
在sql中按顺序a-> b-> c-> d查找具有事件的客户ID 输出应为customerid 1
解决方法
希望在通用SQL中分别使用每个事件并定义它们之间的关系。
SELECT A.customerid
FROM Event A
JOIN Event B USING customerid
JOIN Event C USING customerid
JOIN Event D USING customerid
WHERE A.eventtime < B.eventtime AND A.eventname='a' AND
B.eventtime < C.eventtime AND B.eventname='b' AND
C.eventtime < D.eventtime AND C.eventname='c' AND
D.eventname='d'
,
最简单的解决方案可能是字符串聚合。实际的语法可能会因数据库而异,但是想法是:
select customerid
from mytable
group by customerid
having string_agg(eventname,',' order by eventtime) = 'a,b,c,d'
这在Postgres中有效。在SQL Server中,您可以将having
子句的短语设置为:
string_agg(eventname,') within group(order by eventtime) = 'a,d'
在Oracle中:
having listagg(eventname,d'
在MySQL中:
having group_concat(eventname order by eventtime) = 'a,d'
,
以更便携的方式,在时间序列上使用OLAP或类似功能。
只要有“ a”,“ b”,“ c”,“ d”工作,如果还有其他“ e”或“ z”行,也可以工作。
Vertica,例如:
WITH
input(customerid,eventname,eventtime) AS (
SELECT 1,'a',TIME '1:00:00'
UNION ALL SELECT 1,'b',TIME '1:05:00'
UNION ALL SELECT 1,'c',TIME '1:10:00'
UNION ALL SELECT 1,'d',TIME '1:15:00'
UNION ALL SELECT 2,TIME '1:00:00'
UNION ALL SELECT 2,TIME '1:10:00'
UNION ALL SELECT 2,'f',TIME '1:20:00'
UNION ALL SELECT 3,TIME '2:00:00'
UNION ALL SELECT 3,TIME '2:20:00'
)
SELECT
*,event_name(),pattern_id(),match_id()
FROM input
MATCH(
PARTITION BY customerid
ORDER BY eventtime
DEFINE
is_a AS eventname='a',is_b AS eventname='b',is_c AS eventname='c',is_d AS eventname='d'
PATTERN p AS (is_a is_b is_c is_d)
);
-- out customerid | eventname | eventtime | event_name | pattern_id | match_id
-- out ------------+-----------+-----------+------------+------------+----------
-- out 1 | a | 01:00:00 | is_a | 1 | 1
-- out 1 | b | 01:05:00 | is_b | 1 | 2
-- out 1 | c | 01:10:00 | is_c | 1 | 3
-- out 1 | d | 01:15:00 | is_d | 1 | 4
然后,您可以按获得的pattern_id分组并计算持续时间或所需的任何时间。
任何支持DBMS的OLAP功能:
WITH input(customerid,TIME '2:20:00'
),neighbours AS (
SELECT
*,LEAD(eventname,1) OVER(PARTITION BY customerid ORDER BY eventtime) AS event2,2) OVER(PARTITION BY customerid ORDER BY eventtime) AS event3,3) OVER(PARTITION BY customerid ORDER BY eventtime) AS event4
FROM input
)
SELECT
*
FROM neighbours
WHERE eventname='a'
AND event2 ='b'
AND event3 ='c'
AND event4 ='d'
;
-- out customerid | eventname | eventtime | event2 | event3 | event4
-- out ------------+-----------+-----------+--------+--------+--------
-- out 1 | a | 01:00:00 | b | c | d