问题描述
我有一个包含 3 列的数据集:
Item_id | Sourced_from | 成本 |
---|---|---|
1 | 本地 | 15 |
2 | 本地 | 10 |
3 | 本地 | 20 |
4 | 国际 | 60 |
我正在尝试在 PostgreSQL 中编写一个查询来获取本地和国际项目的总数,客户可以在现金限额内购买。对于现金限额 50,这是我期望的输出:
本地 | 国际化 |
---|---|
3 | 0 |
我对 PostgreSQL 有相当基本的了解,在谷歌搜索后似乎可以用递归 CTE 解决这个问题,我无法弄清楚在这种情况下我应该如何选择我的源种子/锚点。
任何想法,我应该如何处理?
解决方法
不适用于递归 CTE,但仍然有效:
DDL/DML:
create table T
(
id integer primary key generated by default AS IDENTITY,kind text not null,cost integer not null
);
insert into T(kind,cost)
values ('local',15),('local',10),20),('international',60);
-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
(
-- 3. find the total cost of buying this row + all previous rows,grouped by its kind
select X.kind,sum(X.cost) as cost,X.rn
from (
with cte as (
-- 1. assign an increasing row number on each row from the table ordered by its cost
select *,row_number() over (order by T.cost asc,T.kind) as rn
from T
)
-- 2. self-join the CTE on each row with the same kind,but join it only with the rows that have a row number less than or equal to the current row number
select A.id,A.kind,A.cost,B.rn
from cte as A
join cte as B on A.kind = B.kind and A.rn <= B.rn
) as X
group by X.kind,X.rn
)
select M.kind,count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;
输出(db-fiddle):
+-------------+-----+
|kind |count|
+-------------+-----+
|local |3 |
|international|0 |
+-------------+-----+
,
我做了一个CTE例子来解决这个问题:
使用
重新创建您的案例create table kp (item_id int,sourced_from varchar,cost int);
insert into kp values (1,'local',15);
insert into kp values (2,10);
insert into kp values (3,20);
insert into kp values (4,'international',60);
以下查询:
- 仅从
kp
项中选择cost
小于50
- 在
item_id
中添加list_of_items
递归位的作用是: - 与
kp
一起检查source_from
是否相同并且kp.item_id
尚未包含在list_of_items
中(避免多次放置相同的项目)立> - 计算总成本 (
total_cost
) - 将新项目
item_id
添加到list_of_items
WITH RECURSIVE items (item_id,next_item_id,sourced_from,total_cost,nr_items,list_of_items) AS (
SELECT
item_id,item_id as next_item_id,cost as total_cost,1 as nr_items,ARRAY[item_id] list_of_items
from kp where cost < 50
UNION ALL
SELECT
kp.item_id,items.item_id as next_item_id,items.sourced_from,items.total_cost + kp.cost total_cost,items.nr_items + 1 as nr_items,items.list_of_items || kp.item_id as list_of_items
FROM kp join items
on items.sourced_from=kp.sourced_from
and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;
如果你对上面的数据集运行,你会得到详细的结果
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
1 | 1 | local | 15 | 1 | {1}
2 | 2 | local | 10 | 1 | {2}
3 | 3 | local | 20 | 1 | {3}
1 | 2 | local | 25 | 2 | {2,1}
1 | 3 | local | 35 | 2 | {3,1}
2 | 1 | local | 25 | 2 | {1,2}
2 | 3 | local | 30 | 2 | {3,2}
3 | 1 | local | 35 | 2 | {1,3}
3 | 2 | local | 30 | 2 | {2,3}
1 | 2 | local | 45 | 3 | {3,2,1}
1 | 3 | local | 45 | 3 | {2,3,1}
2 | 1 | local | 45 | 3 | {3,1,2}
2 | 3 | local | 45 | 3 | {1,2}
3 | 1 | local | 45 | 3 | {2,3}
3 | 2 | local | 45 | 3 | {1,3}
(15 rows)
显示 3 个 local
项的所有排列。
现在,如果您将最后一个 SELECT
部分替换为
SELECT * FROM items order by nr_items desc,total_cost desc,list_of_items asc limit 1;
您还可以选择具有最大项目数且成本最接近预算的组合(我还添加了基于 list_of_items
的升序,以便在以下情况下始终获得相同的结果多个组合),在上述情况下将导致
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
3 | 2 | local | 45 | 3 | {1,3}
(1 row)
如果您只对 sourced_from
的最大值感兴趣,那么最后一个 SELECT
变为
select sourced_from,max(nr_items) nr_items from items group by sourced_from;
预期的结果是
sourced_from | nr_items
--------------+----------
local | 3
(1 row)
编辑:为了加快查询速度并避免相同对象(例如 {1,3}
和 {1,3}
)的多个排列,我们可以强制下一个 item_id
大于当前的。完整查询
WITH RECURSIVE items (item_id,items.list_of_items || kp.item_id as list_of_items
FROM kp join items
on items.sourced_from=kp.sourced_from
and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
and items.item_id < kp.item_id
WHERE kp.cost + items.total_cost < 50
)
select * from items;
结果
item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items
---------+--------------+--------------+------------+----------+---------------
1 | 1 | local | 15 | 1 | {1}
2 | 2 | local | 10 | 1 | {2}
3 | 3 | local | 20 | 1 | {3}
2 | 1 | local | 25 | 2 | {1,3}
(7 rows)