问题描述
我想根据UID的升序对ID和值列进行排名。一旦值列的值与先前值不同,则预期输出必须更改。排名必须在每个新ID上重新启动
UID ID Value Expected Output
1 1 0 1
2 1 0 1
3 1 1 2
4 1 1 2
5 1 1 2
6 1 0 3
7 1 1 4
8 1 0 5
9 1 0 5
10 1 0 5
11 2 1 1
12 2 1 1
13 2 0 2
14 2 0 2
15 2 1 3
这是我创建的样本数据集:
CREATE TABLE [dbo].[Data] (
[UID] [int] NOT NULL,[ID] [int] NULL,[Value] [int] NULL
);
INSERT [dbo].[Data] ([UID],[ID],[Value]) VALUES (1,1,0);
INSERT [dbo].[Data] ([UID],[Value]) VALUES (2,[Value]) VALUES (3,1);
INSERT [dbo].[Data] ([UID],[Value]) VALUES (4,[Value]) VALUES (5,[Value]) VALUES (6,[Value]) VALUES (7,[Value]) VALUES (8,[Value]) VALUES (9,[Value]) VALUES (10,[Value]) VALUES (11,2,[Value]) VALUES (12,[Value]) VALUES (13,[Value]) VALUES (14,[Value]) VALUES (15,1);
解决方法
我认为,解决这个“缺岛”问题的最简单方法是使用select uid,id,value,1 + sum(case when value <> lag_value then 1 else 0 end)
over(partition by id order by uid) grp
from (
select d.*,lag(value,1,value) over(partition by id order by uid) lag_value
from data d
) d
order by uid
来检索“先前”值,然后使用窗口总和在每次值更改时增加。
id date variable value
1 2019 x 100
1 2019 y 50.5
1 2020 x 10.0
1 2020 y NA
uid | id | value | grp --: | -: | ----: | --: 1 | 1 | 0 | 1 2 | 1 | 0 | 1 3 | 1 | 1 | 2 4 | 1 | 1 | 2 5 | 1 | 1 | 2 6 | 1 | 0 | 3 7 | 1 | 1 | 4 8 | 1 | 0 | 5 9 | 1 | 0 | 5 10 | 1 | 0 | 5 11 | 2 | 1 | 1 12 | 2 | 1 | 1 13 | 2 | 0 | 2 14 | 2 | 0 | 2 15 | 2 | 1 | 3,
这是一个空白和孤岛的问题。我认为最简单的方法是使用行数差异方法:
WITH cte AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UID) rn1,ROW_NUMBER() OVER (PARTITION BY ID,[Value] ORDER BY UID) rn2
FROM Data
)
SELECT *,DENSE_RANK() OVER (PARTITION BY ID ORDER BY rn1 - rn2,[Value]) AS output
FROM cte
ORDER BY UID;
Demo