Oracle SQL：如何删除重复的行

问题描述

根据https://www.oracletutorial.com/advanced-oracle-sql/how-to-delete-duplicate-records-in-oracle/删除了重复的行。

但是，我的情况需要进一步处理。假设我的Table看起来像这样：

CREATE TABLE fruits
(
    fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,fruit_name VARCHAR2(100),color      VARCHAR2(20),status     varchar2(10),PRIMARY KEY (fruit_id)
);

INSERT  INTO fruits(fruit_name,color,status) VALUES ('Apple','Red','INITIAL');
INSERT  INTO fruits(fruit_name,status)  VALUES ('Apple',status) VALUES ('Orange','Orange','COMPLETE');
INSERT  INTO fruits(fruit_name,status) VALUES ('Banana','Yellow','Green','INITIAL');

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits
          GROUP BY fruit_name,color
      )
  AND STATUS = 'INITIAL';

在删除了上述重复项之后，我仍然发现重复行之一（fruit_id = 5）仍然存在。

select * from fruits;

2,Apple,Red,INITIAL
3,Orange,COMPLETE
5,INITIAL
6,Banana,Yellow,INITIAL
7,Green,INITIAL

我想删除所有处于“ INITIAL”状态的重复行。

我应该怎么做？

更新

请确定，逻辑应为：所有处于“ INITIAL”状态的NON-MAX记录都应删除。另外，如果存在状态为“ COMPLETE”的记录，那么我也希望删除重复的“ INITIAL”记录。在我的示例中，我希望删除fruit_id = 5（STATE ='INITIAL'）的记录，因为还有另一条fruit_id = 3（STATE ='COMPLETE'）的记录具有相同的“ orange”值， “橙色”，但具有“完成”值。

解决方法

我将使用相关子查询。我认为您想要的逻辑是：

delete from fruits f
where status = 'INITIAL' and exists(
    select 1 
    from fruits f1 
    where 
        f1.fruit_name = f.fruit_name 
        and f1.color = f.color
        and (
            (f1.status = 'INITIAL' and f1.fruit_id > f.fruit_id)
            or (f1.status = 'COMPLETE' and f1.fruit_id <> f.fruit_id)
        )
)

这将删除状态为初始的行，并且存在另一行的名称和颜色相同且状态为初始且ID较大或状态为完整的行。

Demo on DB Fiddle ：

FRUIT_ID | FRUIT_NAME | COLOR  | STATUS  
-------: | :--------- | :----- | :-------
       2 | Apple      | Red    | INITIAL 
       3 | Orange     | Orange | COMPLETE
       6 | Banana     | Yellow | INITIAL 
       7 | Banana     | Green  | INITIAL

让我们从您要保留的行开始：

select f.*
from (select f.*,sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name,color) as num_complete,max(id) over (partition by fruit_name,color) as max_id
      from fruits f
     ) f
where status = 'COMPLETE' or
      (num_complete = 0 and id < max_id);

这是进行delete的良好基础。一种方法：

delete fruits f
    where not exists (select 1
                      from (select f2.*,color) as max_id
                            from fruits f2
                           ) f2
                      where ( f.status = 'COMPLETE' or
                              (f.num_complete = 0 and f.id < f.max_id)
                            ) and
                            f.fruit_id = f2.fruit_id
                     );

如果要删除大型表中的许多行，可能会发现重新创建表更为有效：

create table temp_fruits as
    select fruit_id,fruit_name,color,status
    from (select f.*,color) as max_id
          from fruits f
         ) f
    where status = 'COMPLETE' or
          (num_complete = 0 and id < max_id);

truncate table fruits;

insert into fruits (fruit_id,status)
     select * from temp_fruits;

请注意，这也会更改行ID。

本来我误会了，以为您也想删除COMPLETE记录：

delete fruits f
    where exists (select 1 
                  from fruits f2
                  where f2.fruit_name = f.fruit_name and
                        f2.color = f.color and
                        f2.status = 'COMPLETE'
                 ) or
          f.id < (select max(f2.id)
                 from fruits f2
                 where f2.fruit_name = f.fruit_name and
                       f2.color = f.color
                );

您可以使用ROW_NUMBER()分析函数

DELETE fruits
 WHERE fruit_id IN 
     ( WITH del AS 
      (
       SELECT f.*,ROW_NUMBER() OVER
              (PARTITION BY fruit_name,color 
                   ORDER BY CASE WHEN f.status = 'COMPLETE' THEN 0 ELSE fruit_id END) 
                      AS rn                            
         FROM fruits f
       )  
       SELECT fruit_id
         FROM del
        WHERE status = 'INITIAL'
          AND rn > 1
      )

其中rn> 1按fruit_name和color分组时会过滤出具有非最大fruit_id值的记录。

Demo

您添加了另一列，因此需要修改逻辑。 NOT IN子句查看处于任何状态的所有水果，应将其限制为仅处于INITIAL状态的水果

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits WHERE status = 'INITIAL'
          GROUP BY fruit_name,color
      )
AND status = 'INITIAL';

我认为您可以执行以下操作：

DELETE
FROM fruits f
WHERE STATUS = 'INITIAL'
  AND EXISTS (SELECT 1 FROM fruits
               WHERE fruit_name = f.fruit_name
                 AND color = f.color
                 AND (STATUS != f.STATUS OR fruit_id > f.fruit_id))

您可以检查是否存在另一个更合适的条目，而不是对值进行分组：

STATUS不再是“ INITIAL”
FRUIT_ID较高

delete from fruits where fruit_id not in 
(
    select  fruit_id_
    from    fruits
    match_recognize
    (
        partition by fruit_name,color
        order by fruit_id
        measures fruit_id as fruit_id_
        all rows per match 
        pattern ( ( a  {- b* -} ) | ( c {- d* -} ) )
        define  a as status = 'INITIAL',b as status = a.status,c as status = 'COMPLETE',d as status = 'INITIAL'
    )
)

duplicates oracle oracle oracle11g sql-delete subquery