问题描述
根据https://www.oracletutorial.com/advanced-oracle-sql/how-to-delete-duplicate-records-in-oracle/删除了重复的行。
但是,我的情况需要进一步处理。假设我的Table看起来像这样:
CREATE TABLE fruits
(
fruit_id NUMBER generated BY DEFAULT AS IDENTITY,fruit_name VARCHAR2(100),color VARCHAR2(20),status varchar2(10),PRIMARY KEY (fruit_id)
);
INSERT INTO fruits(fruit_name,color,status) VALUES ('Apple','Red','INITIAL');
INSERT INTO fruits(fruit_name,status) VALUES ('Apple',status) VALUES ('Orange','Orange','COMPLETE');
INSERT INTO fruits(fruit_name,status) VALUES ('Banana','Yellow','Green','INITIAL');
DELETE
FROM fruits
WHERE fruit_id NOT IN
(
SELECT MAX(fruit_id)
FROM fruits
GROUP BY fruit_name,color
)
AND STATUS = 'INITIAL';
在删除了上述重复项之后,我仍然发现重复行之一(fruit_id = 5)仍然存在。
select * from fruits;
2,Apple,Red,INITIAL
3,Orange,COMPLETE
5,INITIAL
6,Banana,Yellow,INITIAL
7,Green,INITIAL
我想删除所有处于“ INITIAL”状态的重复行。
我应该怎么做?
更新
请确定,逻辑应为:所有处于“ INITIAL”状态的NON-MAX记录都应删除。另外,如果存在状态为“ COMPLETE”的记录,那么我也希望删除重复的“ INITIAL”记录。在我的示例中,我希望删除fruit_id = 5(STATE ='INITIAL')的记录,因为还有另一条fruit_id = 3(STATE ='COMPLETE')的记录具有相同的“ orange”值, “橙色”,但具有“完成”值。
解决方法
我将使用相关子查询。我认为您想要的逻辑是:
delete from fruits f
where status = 'INITIAL' and exists(
select 1
from fruits f1
where
f1.fruit_name = f.fruit_name
and f1.color = f.color
and (
(f1.status = 'INITIAL' and f1.fruit_id > f.fruit_id)
or (f1.status = 'COMPLETE' and f1.fruit_id <> f.fruit_id)
)
)
这将删除状态为初始的行,并且存在另一行的名称和颜色相同且状态为初始且ID较大或状态为完整的行。
FRUIT_ID | FRUIT_NAME | COLOR | STATUS -------: | :--------- | :----- | :------- 2 | Apple | Red | INITIAL 3 | Orange | Orange | COMPLETE 6 | Banana | Yellow | INITIAL 7 | Banana | Green | INITIAL,
让我们从您要保留的行开始:
select f.*
from (select f.*,sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name,color) as num_complete,max(id) over (partition by fruit_name,color) as max_id
from fruits f
) f
where status = 'COMPLETE' or
(num_complete = 0 and id < max_id);
这是进行delete
的良好基础。一种方法:
delete fruits f
where not exists (select 1
from (select f2.*,color) as max_id
from fruits f2
) f2
where ( f.status = 'COMPLETE' or
(f.num_complete = 0 and f.id < f.max_id)
) and
f.fruit_id = f2.fruit_id
);
如果要删除大型表中的许多行,可能会发现重新创建表更为有效:
create table temp_fruits as
select fruit_id,fruit_name,color,status
from (select f.*,color) as max_id
from fruits f
) f
where status = 'COMPLETE' or
(num_complete = 0 and id < max_id);
truncate table fruits;
insert into fruits (fruit_id,status)
select * from temp_fruits;
请注意,这也会更改行ID。
本来我误会了,以为您也想删除COMPLETE
记录:
delete fruits f
where exists (select 1
from fruits f2
where f2.fruit_name = f.fruit_name and
f2.color = f.color and
f2.status = 'COMPLETE'
) or
f.id < (select max(f2.id)
from fruits f2
where f2.fruit_name = f.fruit_name and
f2.color = f.color
);
,
您可以使用ROW_NUMBER()
分析函数
DELETE fruits
WHERE fruit_id IN
( WITH del AS
(
SELECT f.*,ROW_NUMBER() OVER
(PARTITION BY fruit_name,color
ORDER BY CASE WHEN f.status = 'COMPLETE' THEN 0 ELSE fruit_id END)
AS rn
FROM fruits f
)
SELECT fruit_id
FROM del
WHERE status = 'INITIAL'
AND rn > 1
)
其中rn> 1按fruit_name和color分组时会过滤出具有非最大fruit_id值的记录。
,您添加了另一列,因此需要修改逻辑。 NOT IN
子句查看处于任何状态的所有水果,应将其限制为仅处于INITIAL状态的水果
DELETE
FROM fruits
WHERE fruit_id NOT IN
(
SELECT MAX(fruit_id)
FROM fruits WHERE status = 'INITIAL'
GROUP BY fruit_name,color
)
AND status = 'INITIAL';
,
我认为您可以执行以下操作:
DELETE
FROM fruits f
WHERE STATUS = 'INITIAL'
AND EXISTS (SELECT 1 FROM fruits
WHERE fruit_name = f.fruit_name
AND color = f.color
AND (STATUS != f.STATUS OR fruit_id > f.fruit_id))
您可以检查是否存在另一个更合适的条目,而不是对值进行分组:
-
STATUS
不再是“ INITIAL” -
FRUIT_ID
较高
delete from fruits where fruit_id not in
(
select fruit_id_
from fruits
match_recognize
(
partition by fruit_name,color
order by fruit_id
measures fruit_id as fruit_id_
all rows per match
pattern ( ( a {- b* -} ) | ( c {- d* -} ) )
define a as status = 'INITIAL',b as status = a.status,c as status = 'COMPLETE',d as status = 'INITIAL'
)
)