如何按位置查找最繁忙的地理位置-累计总和

问题描述

我正在尝试查找每个位置最繁忙的消费者地理位置，但随后仅返回每个位置足够的地理位置，以便每个位置至少获得90％的消费者。数据库是postgres。

数据库摆弄数据https://www.db-fiddle.com/f/uUgChHGoF33khmXZPRxTkR/2

在这些数据中，有7个位置及其最繁忙的地理位置以及每个loc-geo代表的位置总数的百分比。
数据样本（例如，地理位置609代表位置A的业务的75.7％）：

    Location    Geo     loc_geo_pct_total
    A           609     0.757
    A           479     0.193
    A           463     0.006
    A           606     0.003
    ...
    D           609     0.903
    D           604     0.060
    ...and so on

我以为我会先尝试按geo％降序排序时获取每个位置的累计总和，以使输出看起来像这样：

    Location    Geo     loc_geo_pct_total   cumul_loc_geo
    A           609     0.757               0.757
    A           479     0.193               0.950
    A           463     0.006               0.956
    A           606     0.003               0.959
    ...
    D           609     0.903               0.903
    D           604     0.060               0.963
    ...and so on

我尝试了包括该查询在内的各种查询，但是此查询是错误的，因为它无论位置如何都在不断累积。

    select location,geo,sum(pctoftotal) over (order by location,geo desc rows between unbounded preceding and current row) as loc_geo_cumul_pct
    from tdata
    order by 1,3 desc;

如何修改此查询并返回类似上面形状的结果？

一旦我弄清楚了，那么我可以继续讲第二个问题，在这里我只想显示每个位置的足够地理位置以达到> = 90％。因此，我的数据最终将显示每个位置2个地理位置，但地理位置D仅需要一个地理位置，因为geo 609超过0.9。

对于第一个问题的任何帮助将不胜感激，然后我可以着手处理第二个问题。

解决方法

您需要使用partition by:

select location,geo,sum(pctoftotal) over (partition by location order by geo desc rows between unbounded preceding and current row) as loc_geo_cumul_pct
    from tdata

编辑1：

select location,sum(pctoftotal) over (partition by location order by loc_geo_pct_total desc rows between unbounded preceding and current row) as loc_geo_cumul_pct
        from tdata

编辑2：

--selecting only rows where loc_geo_cumul_pct<=start_loc_geo`
Select * from
(
--find first value for each location where loc_geo_cumul_pct>=0.9
Select *,min(case when loc_geo_cumul_pct>=0.9 then loc_geo_cumul_pct end) over (partition by location) start_loc_geo
from
 (
   select location,sum(pctoftotal) over (partition by location order by loc_geo_pct_total desc rows between unbounded preceding and current row) as loc_geo_cumul_pct
            from tdata
 ) X
) Y 
Where loc_geo_cumul_pct<=start_loc_geo

sql sql

如何按位置查找最繁忙的地理位置-累计总和

问题描述

解决方法

相关问答