PostgreSQL 中递归 CTE 解决背包问题

问题描述

我有一个包含 3 列的数据集：

Item_id	Sourced_from	成本
1	本地	15
2	本地	10
3	本地	20
4	国际	60

我正在尝试在 Postgresql 中编写一个查询来获取本地和国际项目的总数，客户可以在现金限额内购买。对于现金限额 50，这是我期望的输出：

本地	国际化
3	0

我对 Postgresql 有相当基本的了解，在谷歌搜索后似乎可以用递归 CTE 解决这个问题，我无法弄清楚在这种情况下我应该如何选择我的源种子/锚点。

任何想法，我应该如何处理？

解决方法

不适用于递归 CTE，但仍然有效：

DDL/DML：

create table T
(
    id   integer primary key generated by default AS IDENTITY,kind text    not null,cost integer not null
);

insert into T(kind,cost)
values ('local',15),('local',10),20),('international',60);

-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
         (
             -- 3. find the total cost of buying this row + all previous rows,grouped by its kind
             select X.kind,sum(X.cost) as cost,X.rn
             from (
                      with cte as (
                          -- 1. assign an increasing row number on each row from the table ordered by its cost
                          select *,row_number() over (order by T.cost asc,T.kind) as rn
                          from T
                      )
                      -- 2. self-join the CTE on each row with the same kind,but join it only with the rows that have a row number less than or equal to the current row number 
                      select A.id,A.kind,A.cost,B.rn
                      from cte as A
                               join cte as B on A.kind = B.kind and A.rn <= B.rn
                  ) as X
             group by X.kind,X.rn
         )

select M.kind,count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
         left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;

输出（db-fiddle）：

+-------------+-----+
|kind         |count|
+-------------+-----+
|local        |3    |
|international|0    |
+-------------+-----+

我做了一个CTE例子来解决这个问题：

使用

重新创建您的案例

create table kp (item_id int,sourced_from varchar,cost int);
insert into kp values (1,'local',15);
insert into kp values (2,10);
insert into kp values (3,20);
insert into kp values (4,'international',60);

以下查询：

仅从 kp 项中选择 cost 小于 50
在 item_id 中添加 list_of_items 递归位的作用是：
与 kp 一起检查 source_from 是否相同并且 kp.item_id 尚未包含在 list_of_items 中（避免多次放置相同的项目）立>
计算总成本 (total_cost)
将新项目 item_id 添加到 list_of_items

WITH RECURSIVE items (item_id,next_item_id,sourced_from,total_cost,nr_items,list_of_items) AS (
    SELECT 
        item_id,item_id as next_item_id,cost as total_cost,1 as nr_items,ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id,items.item_id  as next_item_id,items.sourced_from,items.total_cost + kp.cost total_cost,items.nr_items + 1 as nr_items,items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
    WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;

如果你对上面的数据集运行，你会得到详细的结果

item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       1 |            2 | local        |         25 |        2 | {2,1}
       1 |            3 | local        |         35 |        2 | {3,1}
       2 |            1 | local        |         25 |        2 | {1,2}
       2 |            3 | local        |         30 |        2 | {3,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       1 |            2 | local        |         45 |        3 | {3,2,1}
       1 |            3 | local        |         45 |        3 | {2,3,1}
       2 |            1 | local        |         45 |        3 | {3,1,2}
       2 |            3 | local        |         45 |        3 | {1,2}
       3 |            1 | local        |         45 |        3 | {2,3}
       3 |            2 | local        |         45 |        3 | {1,3}
(15 rows)

显示 3 个 local 项的所有排列。现在，如果您将最后一个 SELECT 部分替换为

SELECT * FROM items order by nr_items desc,total_cost desc,list_of_items asc limit 1;

您还可以选择具有最大项目数且成本最接近预算的组合（我还添加了基于 list_of_items 的升序，以便在以下情况下始终获得相同的结果多个组合），在上述情况下将导致

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       3 |            2 | local        |         45 |        3 | {1,3}
(1 row)

如果您只对 sourced_from 的最大值感兴趣，那么最后一个 SELECT 变为

select sourced_from,max(nr_items) nr_items from items group by sourced_from;

预期的结果是

 sourced_from | nr_items 
--------------+----------
 local        |        3
(1 row)

编辑：为了加快查询速度并避免相同对象（例如 {1,3} 和 {1,3}）的多个排列，我们可以强制下一个 item_id大于当前的。完整查询

WITH RECURSIVE items (item_id,items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
        and items.item_id < kp.item_id
    WHERE kp.cost + items.total_cost < 50
)
select * from items;

结果

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       2 |            1 | local        |         25 |        2 | {1,3}
(7 rows)

recursive-cte