问题描述
我有一个非常昂贵的查询,需要一个多小时才能执行。我尝试将 EXISTS 子句转换为 join 但我被卡住了,有人可以帮忙吗?
目的是在唯一的空间 ID 中找到重复的产品。 FLAT_TABLE
包含 500 万条记录。
查询:
select
tbl1.product,tbl1.status,tbl1.reservation,tbl1.unique_space_id
FROM
schema1.flat_table tbl1
WHERE
tbl1.status = 'Active'
AND tbl1.product = 'Cage'
AND EXISTS
(SELECT 1
FROM schema1.flat_table tbl2
WHERE tbl2.product = 'Cage'
AND tbl2.status = 'Active'
AND tbl2.reservation <> 'Space Reserved'
AND tbl1.unique_space_id = tbl2.unique_space_id
GROUP BY tbl2.unique_space_id
HAVING COUNT (1) > 1
);
解决方法
您可以使用解析函数 count
如下:
select * from
(select tbl1.product,tbl1.status,tbl1.reservation,tbl1.unique_space_id,count(case when tbl1.reservation <> 'Space Reserved' then 1 end)
over(partition by tbl1.unique_space_id) as cnt
FROM schema1.flat_table tbl1
WHERE tbl1.status = 'Active' AND tbl1.product = 'Cage')
where cnt > 1
,
您可以将查询重写为当前存在子查询的内部联接。连接会以与exists 子句的行为相同的方式产生过滤效果。
SELECT DISTINCT
tbl1.product,tbl1.unique_space_id
FROM schema1.flat_table tbl1
INNER JOIN
(
SELECT unique_space_id
FROM schema1.flat_table
WHERE product = 'Cage' AND
status = 'Active' AND
reservation <> 'Space Reserved'
GROUP BY unique_space_id
HAVING COUNT(*) > 1
) tbl2
ON tbl2.unique_space_id = tbl1.unique_space_id
WHERE
tbl1.status = 'Active' AND
tbl1.product = 'Cage';
这是一个更简洁的版本,使用 COUNT
作为解析函数,以及一个 QUALIFY
子句;
SELECT DISTINCT product,status,reservation,unique_space_id
FROM schema1.flat_table
WHERE status = 'Active' AND product = 'Cage'
QUALIFY COUNT(CASE WHEN reservation <> 'Space Reserved' THEN 1 END)
OVER (PARTITION BY unique_space_id) > 1;