从CTE中的表中删除值

问题描述

我在下面的CTE查询中用“删除”替换“选择*”时遇到麻烦。我没有提供示例表,因为我认为/希望可以不给出答案。问题是如何删除这些返回的记录而不是选择它们-因为当我输入delete时出现错误

“意外删除

如何删除这些记录?请注意,当我在最底部使用“选择*”时,效果很好。

with block_1 as 
(
    select * 
    from table_1
    where col_b is null 
      and col4 || col3 in (select col5 || col6 
                           from table1 
                           group by 1 
                           having count(*) > 1)
),block_2 as
(
    select * 
    from table1
    where col_b is not null 
      and col4 || col3 in (select col5 || col6 
                           from table_1 
                           group by 1 
                           having count(*) > 1)
)
select *
from table_1 
where col4 || col3 in (select col5 || col6 
                       from table1 
                       group by 1 
                       having count(*) > 1) 
  and misc_id in (select distinct misc_id from table1 
                  where misc_id in (select misc_id from block_1) 
                    and misc_id in (select misc_id from block_2))
  and col_b is null;

解决方法

不可能同时使用CTE和DELETE操作。

解决方案将重写您的查询,例如使用USING子句避免CTE。

您可以在is there an alternative to query with DELETE snowflake SQL statement with CTE?https://docs.snowflake.com/en/sql-reference/sql/delete.html这里找到更多信息

,

因此,如果我们将SELECT CTE从IN形式转换为JOIN形式(我们可以这样做,因为IN中的SUB-SELECT提取不同的值),我们可以将所有代码转换为该形式。我暂时在main_s1中留了两个IN。

WITH block_1_s1 AS (
    SELECT col5 || col6 AS col56
    FROM table1 
    GROUP BY 1 
    HAVING COUNT(*) > 1
),block_1 AS (
    SELECT t1.misc_id
    FROM table_1 AS t1
    JOIN block_1_s1 AS t2 
        ON t2.col56 = t1.col4 || t1.col3
    WHERE t1.col_b IS NULL 
),block_2_s1 AS (
    SELECT col5 || col6 as col56
    FROM table_1 
    GROUP BY 1 
    HAVING COUNT(*) > 1
),block_2 AS (
    SELECT t1.misc_id
    FROM table1 AS t1
    JOIN block_2_s1 as t2
        ON t2.col56 = t1.col4 || t1.col3
    WHERE t1.col_b IS NOT NULL 
),main_s1 AS (
    SELECT DISTINCT misc_id 
    FROM table1 
    WHERE misc_id IN (select misc_id from block_1) 
        AND misc_id IN (select misc_id from block_2)
)
SELECT t1.*
FROM table_1 AS t1
JOIN block_1_s1 AS t2 
    ON t2.col56 = t1.col4 || t1.col3
JOIN main_s1 as t3 ON t3.misc_id = t1.misc_id
WHERE t1.col_b is null;

然后在main_s1 CTE中枢转两个IN,我们需要在misc_id上强制唯一性/区别性,以便我们可以从IN(转到JOIN)

WITH block_1_s1 AS (
    SELECT col5 || col6 AS col56
    FROM table1 
    GROUP BY 1 
    HAVING COUNT(*) > 1
),block_1 AS (
    SELECT DISTINCT t1.misc_id
    FROM table_1 AS t1
    JOIN block_1_s1 AS t2 
        ON t2.col56 = t1.col4 || t1.col3
    WHERE t1.col_b IS NULL 
),block_2 AS (
    SELECT DISTINCT t1.misc_id
    FROM table1 AS t1
    JOIN block_2_s1 as t2
        ON t2.col56 = t1.col4 || t1.col3
    WHERE t1.col_b IS NOT NULL 
),main_s1 AS (
    SELECT DISTINCT t1.misc_id 
    FROM table1 AS t1 
    JOIN block_1 AS b1 ON b1.misc_id = t1.misc_id
    JOIN block_2 AS b2 ON b2.misc_id = t1.misc_id
)
SELECT t1.*
FROM table_1 AS t1
JOIN block_1_s1 AS t2 
    ON t2.col56 = t1.col4 || t1.col3
JOIN main_s1 as t3 ON t3.misc_id = t1.misc_id
WHERE t1.col_b is null;

因此现在对SQL进行了重复数据删除,并将所有数据都转换为JOIN(我们实际上可以IN形式完成此操作,就像JOINs一样。),我们可以将所有CTE替换为子选择:

SELECT t1.*
FROM table_1 AS t1
JOIN (
    SELECT col5 || col6 AS col56
    FROM table1 
    GROUP BY 1 
    HAVING COUNT(*) > 1
) AS t2 
    ON t2.col56 = t1.col4 || t1.col3
JOIN (
    SELECT DISTINCT t1.misc_id 
    FROM table1 AS t1 
    JOIN (
        SELECT DISTINCT t1.misc_id
        FROM table_1 AS t1
        JOIN ( 
            SELECT col5 || col6 AS col56
            FROM table1 
            GROUP BY 1 
            HAVING COUNT(*) > 1
        ) AS t2 
            ON t2.col56 = t1.col4 || t1.col3
        WHERE t1.col_b IS NULL 
    ) AS b1 
        ON b1.misc_id = t1.misc_id
    JOIN (
        SELECT DISTINCT t1.misc_id
        FROM table1 AS t1
        JOIN (
            SELECT col5 || col6 as col56
            FROM table_1 
            GROUP BY 1 
            HAVING COUNT(*) > 1
        ) AS t2
            ON t2.col56 = t1.col4 || t1.col3
        WHERE t1.col_b IS NOT NULL 
    ) AS b2 
        ON b2.misc_id = t1.misc_id
) as t3 
    ON t3.misc_id = t1.misc_id
WHERE t1.col_b is null; 

现在我们可以从SELECT切换到DELETE了。

我从IN表单选择JOIN的主要原因是Snowflake的排序/合并比基于Row的更多,而IN是JOIN的一种特殊形式,如果您有索引(Snowflake没有),可以快速。因此,鉴于我们确实在链接JOIN,我们在生产环境中发现了这些联接,您可以在这些联接中预先区分出前几个阶段执行IN,即使它们是相同的(当数据被强制用于JOIN时也是安全的),但是当优化编译器无法看到JOIN表单的速度超过了IN(在配置文件窗口中没有变成JOIN表单)。