选择1%的不同值

问题描述

sql Server 2008 R2

我有一个名为Actions的表,这将是它的外观的一部分

ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363     Open         9843           2020-09-15 09:27:55
555364     Process      2563           2020-09-15 09:31:22
555365     Close        8522           2020-09-15 09:37:48
555366     Detour       9843           2020-09-15 09:42:42
555367     Process      9843           2020-09-15 09:51:50
555368     Close        8522           2020-09-15 09:55:45
555369     Open         1685           2020-09-15 09:57:12
555370     Detour       2563           2020-09-15 10:03:23
555371     Detour       9843           2020-09-15 10:04:33
555372     Close        8522           2020-09-15 10:07:44

该表有成千上万的行。我要做的是查看每个用户在特定月份执行的所有操作的1%。

我知道通过这样做我可以得到一切的1%

SELECT TOP 1 PERCENT * 
FROM Actions 
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
ORDER BY NEWID()

我知道我可以通过以下方式获得特定用户的1%:

SELECT TOP 1 PERCENT * 
FROM Actions 
WHERE ActionUserID = 9843 
  AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
ORDER BY NEWID()

但是我真正想要得到的是每个用户的1%。我知道我可以通过执行以下操作获取该月中执行过操作的用户的列表:

SELECT disTINCT(ActionUserID) 
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'

但是我不确定如何组合这两个查询

解决方法

但是我真正想要得到的是每个用户的1%。

我建议使用窗口功能percent_rank()

select *
from (
    select a.*,percent_rank() over(partition by actionuserid order by newid()) prn
    from actions a
    where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01

如果您的SQL Server版本太旧以至于它不支持percent_rank(),那么我们可以用rank()count()来模拟它:

select *
from (
    select a.*,rank() over(partition by actionuserid order by newid()) as rn,count(*) over(partition by actionuserid) as cnt
    from actions a
    where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt  < 1 or (rn = 1 and cnt < 100)
,

您可以使用CROSS APPLY轻松组合两个查询:

SELECT a.*
FROM (
    SELECT DISTINCT ActionUserID 
    WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
) u
CROSS APPLY
(
    SELECT TOP 1 PERCENT * 
    FROM Actions a
    WHERE a.ActionUserID = u.ActionUserID 
    AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
    ORDER BY NEWID()
) a