问题描述
我的桌子上有汽车样本,我想用sql计算样本的中位数价格。最好的方法是什么?
+-----+-------+----------+
| Car | Price | Quantity |
+-----+-------+----------+
| A | 100 | 2 |
| B | 150 | 4 |
| C | 200 | 8 |
+-----+-------+----------+
我知道如果我的表是这样的,我可以使用percentile_cont(或percentile_disc):
+-----+-------+
| Car | Price |
+-----+-------+
| A | 100 |
| A | 100 |
| B | 150 |
| B | 150 |
| B | 150 |
| B | 150 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
+-----+-------+
但是在现实世界中,我的第一个表大约有1亿行,第二个表应该有大约3个台球行(而且我不知道如何将我的第一个表转换为第二个表)。
解决方法
这在少数结果上看起来是正确的,但请尝试进行更大的设置以进行双重检查。
首先创建一个表格,其中包含您选择的每辆车的总数(或使用CTE或子查询)。我只是在这里创建一个单独的表。
create table table2 as
(
select car,quantity,price,price * quantity as total
from table1
)
然后运行此查询,查找位于中间的价格组。
select price
from (
select car,sum(total) over (order by car) as rollsum,sum(total) over () as total
from table2
)a
where rollsum >= total/2
正确返回$ 200的值。
,这是在sql server中执行此操作的方法
第一步是计算与中位数的上下限相对应的索引(如果元素的数量为奇数,则上下限相同,否则基于x / 2和x / 2 + 1值)
然后我得到数量的累加和以及用于选择与上下限相对应的元素的用途,如下所示
with median_dt
as (
select case when sum(quantity)%2=0 then
sum(quantity)/2
else
sum(quantity)/2 + 1
end as lower_limit,case when sum(quantity)%2=0 then
(sum(quantity)/2) + 1
else
sum(quantity)/2 + 1
end as upper_limit
from t
),data
as (
select *,sum(quantity) over(order by price asc) as cum_sum
from t
),rnk_val
as(select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.lower_limit<=d.cum_sum
)x
where x.rnk=1
union all
select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.upper_limit<=d.cum_sum
)x
where x.rnk=1
)
select avg(price) as median
from rnk_val
+--------+
| median |
+--------+
| 200 |
+--------+
db小提琴链接 https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c5cfa645a22aa9c135032eb28f1749f6