问题描述
sql Server 2008 R2
ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363 Open 9843 2020-09-15 09:27:55
555364 Process 2563 2020-09-15 09:31:22
555365 Close 8522 2020-09-15 09:37:48
555366 Detour 9843 2020-09-15 09:42:42
555367 Process 9843 2020-09-15 09:51:50
555368 Close 8522 2020-09-15 09:55:45
555369 Open 1685 2020-09-15 09:57:12
555370 Detour 2563 2020-09-15 10:03:23
555371 Detour 9843 2020-09-15 10:04:33
555372 Close 8522 2020-09-15 10:07:44
该表有成千上万的行。我要做的是查看每个用户在特定月份执行的所有操作的1%。
我知道通过这样做我可以得到一切的1%
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
我知道我可以通过以下方式获得特定用户的1%:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionUserID = 9843
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
但是我真正想要得到的是每个用户的1%。我知道我可以通过执行以下操作获取该月中执行过操作的用户的列表:
SELECT disTINCT(ActionUserID)
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
但是我不确定如何组合这两个查询。
解决方法
但是我真正想要得到的是每个用户的1%。
我建议使用窗口功能percent_rank()
:
select *
from (
select a.*,percent_rank() over(partition by actionuserid order by newid()) prn
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01
如果您的SQL Server版本太旧以至于它不支持percent_rank()
,那么我们可以用rank()
和count()
来模拟它:
select *
from (
select a.*,rank() over(partition by actionuserid order by newid()) as rn,count(*) over(partition by actionuserid) as cnt
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt < 1 or (rn = 1 and cnt < 100)
,
您可以使用CROSS APPLY轻松组合两个查询:
SELECT a.*
FROM (
SELECT DISTINCT ActionUserID
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
) u
CROSS APPLY
(
SELECT TOP 1 PERCENT *
FROM Actions a
WHERE a.ActionUserID = u.ActionUserID
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
) a