每天更新 SCD-2 中约 3000 万条记录

问题描述

我有一个 SCD-2 表，其中包含 4 亿多条记录。每天我都会得到 40M 条记录，其中大约 20M 条被插入和更新。我在这个项目中使用了 Talend & Oracle。问题是更新这个数量的记录需要很多时间（很多小时）。我的 SCD-2 目标表的结构如下：

SKEY    NUMBER(38,0)
AS_OF_DATE  DATE
TSTP    DATE
COUNTRY VARCHAR2(2 BYTE)
RISK    VARCHAR2(50 BYTE)
DATE_OF_DATA    DATE
PRIMARY_COUNTRY VARCHAR2(5 BYTE)
RISK_VALUES NUMBER(8,5)
START_DATE  DATE
END_DATE    DATE

主键 - Skey、as_of_date、primary_country 和（每月）在 as_of_date（每日处理日期）上分区。如何提高作业的性能以更快地更新目标表中的记录？

我已尝试在 TEMP 阶段表中插入所有要更新的数据，然后使用 MERGE 更新目标表中的记录。此外，我在阶段和目标表之间使用内部连接运行了一个更新语句，但我仍然看到性能不佳。目标表在 as_of_date 上建立索引， primary_country 和 end_date 上的聚集索引。

使用的查询是：

*merge into geo_crisks_delta D
using (select as_of_date,primary_country,skey,end_date 
from geo_crisks_delta_test) T 
ON 
(D.as_of_date= T.as_of_date   
 and D.primary_country=T.primary_country
 and D.skey=T.skey)
when matched then
update
set D.end_date = T.end_date
where 
D.primary_country in 
 (select distinct country from geo_countries) and 
(D.end_date=to_date('2099-12-31','yyyy-MM-dd'));*

//OR

*update
(select a.end_date as delta,b.end_date as stage 
from 
geo_crisks_delta a 
inner join geo_crisks_delta_test b
on 
a.as_of_date=b.as_of_date
and a.primary_country=b.primary_country
and a.skey=b.skey
where a.end_date=to_date('2099-12-31','yyyy-MM-dd') ) t 
set t.delta=t.stage;*

我对编写存储过程的了解较少。任何人都可以帮助我该怎么做才能改善这一点，我做错了什么？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

oracle oracle query-optimization scd2 sql sql talend