PostgreSQL 中递归 CTE 解决背包问题

问题描述

我有一个包含 3 列的数据集:

Item_id Sourced_from 成本
1 本地 15
2 本地 10
3 本地 20
4 国际 60

我正在尝试在 PostgreSQL 中编写一个查询来获取本地和国际项目的总数,客户可以在现金限额内购买。对于现金限额 50,这是我期望的输出:

本地 国际化
3 0

我对 PostgreSQL 有相当基本的了解,在谷歌搜索后似乎可以用递归 CTE 解决这个问题,我无法弄清楚在这种情况下我应该如何选择我的源种子/锚点。

任何想法,我应该如何处理?

解决方法

不适用于递归 CTE,但仍然有效:

DDL/DML:

create table T
(
    id   integer primary key generated by default AS IDENTITY,kind text    not null,cost integer not null
);

insert into T(kind,cost)
values ('local',15),('local',10),20),('international',60);
-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
         (
             -- 3. find the total cost of buying this row + all previous rows,grouped by its kind
             select X.kind,sum(X.cost) as cost,X.rn
             from (
                      with cte as (
                          -- 1. assign an increasing row number on each row from the table ordered by its cost
                          select *,row_number() over (order by T.cost asc,T.kind) as rn
                          from T
                      )
                      -- 2. self-join the CTE on each row with the same kind,but join it only with the rows that have a row number less than or equal to the current row number 
                      select A.id,A.kind,A.cost,B.rn
                      from cte as A
                               join cte as B on A.kind = B.kind and A.rn <= B.rn
                  ) as X
             group by X.kind,X.rn
         )

select M.kind,count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
         left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;

输出(db-fiddle):

+-------------+-----+
|kind         |count|
+-------------+-----+
|local        |3    |
|international|0    |
+-------------+-----+
,

我做了一个CTE例子来解决这个问题:

使用

重新创建您的案例
create table kp (item_id int,sourced_from varchar,cost int);
insert into kp values (1,'local',15);
insert into kp values (2,10);
insert into kp values (3,20);
insert into kp values (4,'international',60);

以下查询:

  • 仅从 kp 项中选择 cost 小于 50
  • item_id 中添加 list_of_items 递归位的作用是:
  • kp 一起检查 source_from 是否相同并且 kp.item_id 尚未包含在 list_of_items 中(避免多次放置相同的项目)立>
  • 计算总成本 (total_cost)
  • 将新项目 item_id 添加到 list_of_items
WITH RECURSIVE items (item_id,next_item_id,sourced_from,total_cost,nr_items,list_of_items) AS (
    SELECT 
        item_id,item_id as next_item_id,cost as total_cost,1 as nr_items,ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id,items.item_id  as next_item_id,items.sourced_from,items.total_cost + kp.cost total_cost,items.nr_items + 1 as nr_items,items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
    WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;

如果你对上面的数据集运行,你会得到详细的结果

item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       1 |            2 | local        |         25 |        2 | {2,1}
       1 |            3 | local        |         35 |        2 | {3,1}
       2 |            1 | local        |         25 |        2 | {1,2}
       2 |            3 | local        |         30 |        2 | {3,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       1 |            2 | local        |         45 |        3 | {3,2,1}
       1 |            3 | local        |         45 |        3 | {2,3,1}
       2 |            1 | local        |         45 |        3 | {3,1,2}
       2 |            3 | local        |         45 |        3 | {1,2}
       3 |            1 | local        |         45 |        3 | {2,3}
       3 |            2 | local        |         45 |        3 | {1,3}
(15 rows)

显示 3 个 local 项的所有排列。 现在,如果您将最后一个 SELECT 部分替换为

SELECT * FROM items order by nr_items desc,total_cost desc,list_of_items asc limit 1;

您还可以选择具有最大项目数且成本最接近预算的组合(我还添加了基于 list_of_items 的升序,以便在以下情况下始终获得相同的结果多个组合),在上述情况下将导致

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       3 |            2 | local        |         45 |        3 | {1,3}
(1 row)

如果您只对 sourced_from 的最大值感兴趣,那么最后一个 SELECT 变为

select sourced_from,max(nr_items) nr_items from items group by sourced_from;

预期的结果是

 sourced_from | nr_items 
--------------+----------
 local        |        3
(1 row)

编辑:为了加快查询速度并避免相同对象(例如 {1,3}{1,3})的多个排列,我们可以强制下一个 item_id大于当前的。完整查询

WITH RECURSIVE items (item_id,items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
        and items.item_id < kp.item_id
    WHERE kp.cost + items.total_cost < 50
)
select * from items;

结果

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       2 |            1 | local        |         25 |        2 | {1,3}
(7 rows)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...