问题描述
我是 teradata 的新手,我有一个小的 sql 问题, 类似于下面这个:
源表A:
a|b|c| dt |dt_f
-------------------------
1|1|5|30/01/2020|21/02/2020
1|1|2|28/02/2020|19/03/2020
1|1|2|20/03/2020|17/04/2020
1|1|2|19/04/2020|05/05/2020
1|1|2|30/06/2020|24/07/2020
1|1|2|27/07/2020|31/12/2999
想要的输出:
a|b|c| dt |dt_f
------------------------------
1|1|5|30/01/2020|**27/02/2020**
1|1|2|28/02/2020|**19/05/2020**
1|1|2|30/06/2020|**31/12/2999**
说明:
1 --> 如果 c 不同(当前行和下一行之间)所以当前行的 dt_f = 下一行的 dt - 1 天,则选择两行
2--> 如果months_between(dt,dt) > 1(在第4行和第5行之间的例子中),那么第一行的dt将是df(行号 4) + 1 个月 并且第 5 行将被选中,dt_f = 31/12/2999。
我尝试了很多使用递归,但我没有得到真正的结果,但我相信它可以解决。
感谢您的回复:)
解决方法
如果这只是为了返回重叠的行,那么使用 Teradata 的 NORMALIZE 扩展会非常简单:
CREATE VOLATILE TABLE vt
(a INT,b INT,c INT,dt DATE,dt_f DATE)
ON COMMIT PRESERVE ROWS;
INSERT INTO vt(1,1,5,DATE '2020-01-30',DATE '2020-02-21');
INSERT INTO vt(1,2,DATE '2020-02-28',DATE '2020-03-19');
INSERT INTO vt(1,DATE '2020-03-20',DATE '2020-04-17');
INSERT INTO vt(1,DATE '2020-04-19',DATE '2020-05-05');
INSERT INTO vt(1,DATE '2020-06-30',DATE '2020-07-24');
INSERT INTO vt(1,DATE '2020-07-27',DATE '2999-12-31');
WITH cte AS
( -- adjusting for gaps > 1 month
SELECT NORMALIZE a,b,c,PERIOD(dt,Add_Months(dt_f,1)) AS pd
FROM vt
)
SELECT a,Begin(pd) AS dt,Add_Months(End(pd),-1) AS dt_f
FROM cte
;
但是您调整结束日期的逻辑需要分析函数。这可能是获取那些重叠时间段和附加列的最简单查询,修改以匹配您的逻辑:
WITH cte AS
( -- returns both start/end of an island,but in seperate rows
SELECT
a,dt -- start of current island,Max(dt_f) -- end of previous island (used for finding gaps)
Over (PARTITION BY a,c
ORDER BY dt
ROWS BETWEEN Unbounded Preceding
AND 1 Preceding) AS prev_max_end,Lag(dt) -- to adjust end date in case of gap > 1 month
Over (PARTITION BY a,c
ORDER BY dt) AS prev_dt
FROM vt
QUALIFY Add_Months(prev_max_end,1) < dt -- gap found
OR prev_max_end IS NULL -- first row
)
SELECT
a,dt -- start of current island
-- next row has end of current island,CASE
WHEN Lead(c ) -- change in c column?
Over (PARTITION BY a,b
ORDER BY dt) <> c
THEN Lead(dt) -- start of next island - 1
Over (PARTITION BY a,b
ORDER BY dt) -1
ELSE --
Lead(Add_Months(prev_dt,1),DATE '2999-12-31')
Over (PARTITION BY a,b
ORDER BY dt)
END
FROM cte
;