给定一组 ID，仅返回具有这些 ID 的订单子集

问题描述

给定一组 product_ids，只有有那些 order_ids 的 product_ids 是什么？

对于下面的示例，我只需要具有 (a,b,c) 某种组合的 order_id。我有 2 个表，如下所示：

“交易”表：

order_id | product_id |
---------+-------------
    1    |    a       |
    1    |    b       |
    2    |    a       |
    2    |    X       |
    3    |    a       |
    3    |    b       |
    3    |    c       |
    ...  |    ...     |
    999  |    Y       |

“产品”表：

product_id |
------------
     a     |
     b     |
     c     |
     d     |
     X     |
     Y     |
     ...   |
     ZZZ   |

Desired Output 有 2 个 order_ids 和预期的表输出：

order_id |
----------
    1    |
    3    |

请注意，order_id == 2 虽然有 product_id == a，但已被移除，但因为它有 product_id == X，所以应该将其移除。

所以这并不简单：

SELECT disTINCT(order_id)
FROM transactions
WHERE product_id IN (a,c)

解决方法

通常，有一个 orders 表与之配套，每个订单只有一行。

如果我们可以进一步假设每个订单总是至少有一个交易，这将完成这项工作：

SELECT o.id
FROM   orders o
WHERE  NOT EXISTS (
   SELECT FROM transactions  -- SELECT list can be empty for EXISTS test
   WHERE  order_id = o.id
   AND    product_id <> ALL ('{a,b,c}')
   );

这对非常常见的 product_id 或长列表很有用。

对于短名单或稀有产品，先从正面选择开始会更快。喜欢：

SELECT order_id
FROM  (
   SELECT DISTINCT order_id
   FROM   transactions
   WHERE  product_id = ANY ('{a,c}')
   ) t
WHERE  NOT EXISTS (
   SELECT FROM transactions
   WHERE  order_id = t.order_id
   AND    product_id <> ALL ('{a,c}')
   );

(product_id) 上的索引对于性能至关重要。更好的是，(product_id,order_id) 上的多列索引，以及 (order_id,product_id) 上的另一个索引。见：

Is a composite index also good for queries on the first field?

关于数组字面量的手册：

https://www.postgresql.org/docs/current/arrays.html#ARRAYS-INPUT

关于 ANY 和 ALL 结构：

我们需要定义您的要求的相反是什么，并对其进行过滤。因此，哪些订单至少有一笔交易不在 a,c 中。我们统计订单分组中此类交易的数量，并过滤掉带有 COUNT > 0 的订单，只返回带有 COUNT = 0 的订单。

SELECT order_id
FROM transactions
GROUP BY order_id
HAVING COUNT(CASE WHEN product_id NOT IN (a,c) THEN 1 END) = 0

如果您将 a,c 作为另一个表中的产品列表，并且您想对其进行过滤而不是将其硬编码到查询中，那么它会稍微复杂一些：

SELECT order_id
FROM transactions AS t
LEFT JOIN listOfProducts AS l ON l.product_id = t.product_id
GROUP BY order_id
HAVING COUNT(CASE WHEN l.product_id IS NULL THEN 1 END) = 0

amazon-redshift postgresql relational-division sql sql