使用Count计算SQL中的百分比时未获得预期的输出

问题描述

我正在运行查询,但没有输出我期望的结果。我有下表:

------------------------
security_ID | Date | Rep 
------------------------
2256        |202001|  0
2257        |202002|  0
2257        |202003|  0
2256        |202002|  1
2256        |202003|  2
2257        |202003|  1

我实质上是在寻找Rep列从0变为1,从一个日期到下一个日期,而Rep在上一个日期为0的情况下,给定security_IDDate中的差应为1(例如202002-202001 =1。此处的日期为整数)以进行计算。 在这里,对于security_ID = 2256Percent = 100202001202002Rep从0变为1,表中的行数为2256原为0为1。百分比的等式为:

百分比=(Rep_current = 1和Rep_prev = 0的情况下的数量)/(Rep_prev = 0的情况下的数量)* 100

对于security_ID = 2257,百分比= 1/2 * 100 = 50

例如,我希望输出为:

----------------------------------
security_ID | Date | Rep | Percent
----------------------------------
2256        |202001|  0  |  100
2257        |202002|  0  |  50 
2257        |202003|  0  |  50 
2256        |202002|  1  |  100 
2256        |202003|  2  |  100
2257        |202003|  1  |  50

我尝试如下操作:

SELECT security_ID,Date,Rep,(COUNT(CASE WHEN Rep_prev = 0 and Rep = 1 then 1 else 0 end)/count(CASE WHEN Rep_prev = 0 then 1 else 0 end) * 100) as "Percent"
from
(
    select t1.security_id,t1.date,t1.rep,coalesce(t2.rep,0) as Rep_prev
      from mytable t1
      left join mytable t2
        on t1.security_id = t2.security_id
       and t2.date = t1.date - 1
    )
GROUP BY Security_ID,Rep

但是我得到的输出是:

----------------------------------
security_ID | Date | Rep | Percent
----------------------------------
2256        |202001|  0  |  100
2257        |202002|  0  |  100 
2257        |202003|  0  |  100 
2256        |202002|  1  |  100 
2256        |202003|  2  |  100
2257        |202003|  1  |  100

不太确定我的逻辑在哪里。

如果您想了解更多信息,请告诉我,因为这很难使这个想法变得有用。

解决方法

您可以使用以下逻辑(使用Windows函数)-

Select mt.security_id,mt.Date,mt.Rep,(numerator*100)/denominator As "Percent"
from
  (Select security_id,SUM(case when Rep_curr = 1 and coalesce(Rep_prev,-99) = 0 then 1 else 0 end) As numerator,SUM(case when coalesce(Rep_prev,-99) = 0 then 1 else 0 end) As denominator
    from
     (Select security_id,Rep As Rep_curr,lag(Rep,1) over(partition by security_id order by Date) As Rep_prev
      from my_table) t 
   where t.Rep_prev is not NULL
   group by security_id) tab
   join my_table mt 
   on tab.security_id = mt.security_id;

这是一个db小提琴链接,演示了它如何在SQL Server-https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=8654f0242bbed911f1ed63b795281bac中工作。上面的语法在Sybase上也适用。

编辑: 如果我要使用您的方法,这就是我的做法-

SELECT mt.*,Tab1.Perct As "Percent"
FROM
    (SELECT security_ID,(SUM(CASE WHEN Rep_prev = 0 and Rep = 1 then 1 else 0 end) * 100/SUM(CASE WHEN Rep_prev = 0 then 1 else 0 end)) as Perct
    from
       (
        select t1.security_id,t1.date,t1.rep,coalesce(t2.rep,0) as Rep_prev
          from mytable t1
          left join mytable t2
            on t1.security_id = t2.security_id
           and t2.date = t1.date - 1
          where  t2.rep IS NOT NULL
        ) t
    GROUP BY SECURITY_ID ) Tab1
    JOIN mytable mt
    ON TAB1.security_ID = mt.security_ID;
,

使用窗口函数,如下所示:

select t.*,(sum(case when prev_rep = 0 and rep = 1 then 100.0 else 0 end) over (partition by security_id) /
        sum(case when rep = 0 then 1 else 0 end) over (partition by security_id)
       ) as precent
from (select t.*,lag(rep) over (partition by security_id order by date) as prev_rep
      from mytable t
     ) t;

如果您使用的是不支持窗口功能的Sybase版本,则可以使用以下方法计算此每个security_id

select security_id,(sum(case when prev_rep = 0 and rep = 1 then 100.0 else 0 end) /
        sum(case when rep = 0 then 1 else 0 end)
       ) as precent
from (select t.*,(select top (1) t2.rep
              from mytable t2
              where t2.security_id = t.security_id and
                    t2.date < t.date
              order by t2.date desc
             ) as prev_rep
      from mytable t
     ) t
group by security_id;

如果您需要每行使用此功能,则可以join回到表。

编辑:

如果您每个月都有行,那么也许会起作用:

select security_id,tprev.rep as prev_rep
      from mytable t left join
           mytable tprev
           on t.security_id = tprev.security_id and
              convert(date,tprev.date + '01') = dateadd(month,-1,convert(date,t.date + '01'))
     ) t
group by security_id;

注意:这假设date是一个字符串。将其存储为适当的date将简化逻辑。