问题描述
这是我现有的表格数据
C1 C2 C3
1 A 1
2 B 1
3 C 0
4 D 0
5 E 0
6 F 0
7 G 1
8 H 1
9 I 1
10 J 0
我想要这个。我正在尝试选择70%的C3列,其值为1。总共C3有五个列。所以5的70%是3.5,也就是4。所以我想获得最终数据集,其中包含C3中70%的数据集
C1 C2 C3
1 A 1
2 B 1
3 C 0
4 D 0
5 E 0
7 G 1
8 H 1
解决方法
这是答案
select *
from
(SELECT *,(SELECT SUM(C3) FROM table_name t1 WHERE t1.C1 <= t.C1) AS cumulative_sum,(select sum(C3) from table_name) as total_sum
FROM table_name t) t
where (cumulative_sum - C3) < 0.8 * total_sum
,
嗯。您似乎不需要随机选择。它们似乎由col1
排序。因此,您可以将其计算为:
select t.*
from (select t.*,sum(case when col3 = 1 then 1 else 0 end) over (order by col1) as running_col3,sum(case when col3 = 1 then 1 else 0 end) over () as total_col3
from t
) t
where running_col3 >= 0.8 * total_col3 and
(running_col3 - col3) < 0.8 * total_col3;
注意:如果col3
只有0
和1
,则可以将以上内容简化为:
select t.*
from (select t.*,sum(col3) over (order by col1) as running_col3,sum(col3) over () as total_col3
from t
) t
where running_col3 >= 0.8 * total_col3 and
(running_col3 - col3) < 0.8 * total_col3