Oracle SQL:如何删除重复的行

问题描述

根据https://www.oracletutorial.com/advanced-oracle-sql/how-to-delete-duplicate-records-in-oracle/删除了重复的行。

但是,我的情况需要进一步处理。假设我的Table看起来像这样:

CREATE TABLE fruits
(
    fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,fruit_name VARCHAR2(100),color      VARCHAR2(20),status     varchar2(10),PRIMARY KEY (fruit_id)
);

INSERT  INTO fruits(fruit_name,color,status) VALUES ('Apple','Red','INITIAL');
INSERT  INTO fruits(fruit_name,status)  VALUES ('Apple',status) VALUES ('Orange','Orange','COMPLETE');
INSERT  INTO fruits(fruit_name,status) VALUES ('Banana','Yellow','Green','INITIAL');

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits
          GROUP BY fruit_name,color
      )
  AND STATUS = 'INITIAL';

删除了上述重复项之后,我仍然发现重复行之一(fruit_id = 5)仍然存在。

select * from fruits;

2,Apple,Red,INITIAL
3,Orange,COMPLETE
5,INITIAL
6,Banana,Yellow,INITIAL
7,Green,INITIAL

我想删除所有处于“ INITIAL”状态的重复行。

我应该怎么做?

更新

请确定,逻辑应为:所有处于“ INITIAL”状态的NON-MAX记录都应删除。另外,如果存在状态为“ COMPLETE”的记录,那么我也希望删除重复的“ INITIAL”记录。在我的示例中,我希望删除fruit_id = 5(STATE ='INITIAL')的记录,因为还有另一条fruit_id = 3(STATE ='COMPLETE')的记录具有相同的“ orange”值, “橙色”,但具有“完成”值。

解决方法

我将使用相关子查询。我认为您想要的逻辑是:

delete from fruits f
where status = 'INITIAL' and exists(
    select 1 
    from fruits f1 
    where 
        f1.fruit_name = f.fruit_name 
        and f1.color = f.color
        and (
            (f1.status = 'INITIAL' and f1.fruit_id > f.fruit_id)
            or (f1.status = 'COMPLETE' and f1.fruit_id <> f.fruit_id)
        )
)

这将删除状态为初始的行,并且存在另一行的名称和颜色相同且状态为初始且ID较大或状态为完整的行。

Demo on DB Fiddle

FRUIT_ID | FRUIT_NAME | COLOR  | STATUS  
-------: | :--------- | :----- | :-------
       2 | Apple      | Red    | INITIAL 
       3 | Orange     | Orange | COMPLETE
       6 | Banana     | Yellow | INITIAL 
       7 | Banana     | Green  | INITIAL 
,

让我们从您要保留的行开始:

select f.*
from (select f.*,sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name,color) as num_complete,max(id) over (partition by fruit_name,color) as max_id
      from fruits f
     ) f
where status = 'COMPLETE' or
      (num_complete = 0 and id < max_id);

这是进行delete的良好基础。一种方法:

delete fruits f
    where not exists (select 1
                      from (select f2.*,color) as max_id
                            from fruits f2
                           ) f2
                      where ( f.status = 'COMPLETE' or
                              (f.num_complete = 0 and f.id < f.max_id)
                            ) and
                            f.fruit_id = f2.fruit_id
                     );

如果要删除大型表中的许多行,可能会发现重新创建表更为有效:

create table temp_fruits as
    select fruit_id,fruit_name,color,status
    from (select f.*,color) as max_id
          from fruits f
         ) f
    where status = 'COMPLETE' or
          (num_complete = 0 and id < max_id);

truncate table fruits;

insert into fruits (fruit_id,status)
     select * from temp_fruits;

请注意,这也会更改行ID。

本来我误会了,以为您也想删除COMPLETE记录:

delete fruits f
    where exists (select 1 
                  from fruits f2
                  where f2.fruit_name = f.fruit_name and
                        f2.color = f.color and
                        f2.status = 'COMPLETE'
                 ) or
          f.id < (select max(f2.id)
                 from fruits f2
                 where f2.fruit_name = f.fruit_name and
                       f2.color = f.color
                );
,

您可以使用ROW_NUMBER()分析函数

DELETE fruits
 WHERE fruit_id IN 
     ( WITH del AS 
      (
       SELECT f.*,ROW_NUMBER() OVER
              (PARTITION BY fruit_name,color 
                   ORDER BY CASE WHEN f.status = 'COMPLETE' THEN 0 ELSE fruit_id END) 
                      AS rn                            
         FROM fruits f
       )  
       SELECT fruit_id
         FROM del
        WHERE status = 'INITIAL'
          AND rn > 1
      )

其中rn> 1按fruit_name和color分组时会过滤出具有非最大fruit_id值的记录。

Demo

,

您添加了另一列,因此需要修改逻辑。 NOT IN子句查看处于任何状态的所有水果,应将其限制为仅处于INITIAL状态的水果

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits WHERE status = 'INITIAL'
          GROUP BY fruit_name,color
      )
AND status = 'INITIAL';
,

我认为您可以执行以下操作:

DELETE
FROM fruits f
WHERE STATUS = 'INITIAL'
  AND EXISTS (SELECT 1 FROM fruits
               WHERE fruit_name = f.fruit_name
                 AND color = f.color
                 AND (STATUS != f.STATUS OR fruit_id > f.fruit_id))

您可以检查是否存在另一个更合适的条目,而不是对值进行分组:

  • STATUS不再是“ INITIAL”
  • FRUIT_ID较高
,
delete from fruits where fruit_id not in 
(
    select  fruit_id_
    from    fruits
    match_recognize
    (
        partition by fruit_name,color
        order by fruit_id
        measures fruit_id as fruit_id_
        all rows per match 
        pattern ( ( a  {- b* -} ) | ( c {- d* -} ) )
        define  a as status = 'INITIAL',b as status = a.status,c as status = 'COMPLETE',d as status = 'INITIAL'
    )
)