问题描述
我试图在不使用 LAG()
的情况下重新创建 LAG()
函数,但具有依赖于列的动态 Offset
。我要将这段代码复制到 Sparksql
中。
这是我的示例数据:
if object_id('tempdb.dbo.#myTable') is not null drop table #myTable
create table #myTable (id int,dates int,flag char,FromToFlagType varchar(2),FromToCounter INT)
insert into #myTable values(1,'20181031','V','VV',1)
insert into #myTable values(2,'20181130',2)
insert into #myTable values(3,'20181231',3)
insert into #myTable values(4,'20190131','F','VF',1)
insert into #myTable values(5,'20190228','FF',2)
insert into #myTable values(6,'20190331',3)
insert into #myTable values(7,'20190430',4)
insert into #myTable values(8,'20190531','FV',1)
insert into #myTable values(9,'20190630',2)
insert into #myTable values(10,'20190731',3)
id dates flag FromToFlagType FromToCounter
1 20181031 V VV 1
2 20181130 V VV 2
3 20181231 V VV 3
4 20190131 F VF 1
5 20190228 F FF 2
6 20190331 F FF 3
7 20190430 F FF 4
8 20190531 V VF 1
9 20190630 V VV 2
10 20190731 V VV 3
所以我想要做的是复制以下结果,但不使用动态 Offset
:
select
*,LAG(FromToFlagType,FromToCounter-1) OVER ( ORDER BY dates) AS FromToStage
from
#mytable
id dates flag FromToFlagType FromToCounter FromToStage
1 20181031 V VV 1 VV
2 20181130 V VV 2 VV
3 20181231 V VV 3 VV
4 20190131 F VF 1 VF
5 20190228 F FF 2 VF
6 20190331 F FF 3 VF
7 20190430 F FF 4 VF
8 20190531 V FV 1 FV
9 20190630 V VV 2 FV
10 20190731 V VV 3 FV
我知道您可以使用 LAG()
和 CTE
复制 JOIN
,但它似乎只有在您提前知道偏移量将是什么时才有效。我在这里尝试过类似的东西,但我只是无法得到相同的结果。
我发现了类似 here 的东西,但我是 Spark 的新手,我需要一个使用 Sparksql
的解决方案。我想如果我能复制这个函数,我就可以将它复制到“Spark”。
WITH FromToStage AS(
select
*,id-(FromToCounter-1) AS id_2
from #mytable
--order by dates
)
SELECT
a.*,b.FromToFlagType as FromToStage
FROM
FromToStage a
JOIN
FromToStage b
ON
a.id = b.id_2
order by dates
解决方法
根据你的逻辑,你想要 first_value()
select t.*,first_value(FromToFlagType) over (partition by seqnum - seqnum_2 order by date)_ as first_FromToFlagType
from (select t.*,row_number() over order by date) as seqnum,row_number() over (partition by flag order by date) as seqnum_2
from #mytable t
) t;