左外连接结果在 hive 上变得更大

问题描述

全部,我想从这个查询中得到可靠的结果:

SELECT ..

FROM
(
SELECT 
 CO_CODE,REP.cua cua,PRD.PRODUCT_DESC,REGEXP_EXTRACT(B.rfbbn,'^(?:[^*]*\\*){2}([^*]*)',1) cllt,NVL(CCY_bbce,0) bbce,B.TYPE,A.conn_keyy

FROM 
(
SELECT conn_keyy,ext_date FROM 
(tablee.aa) A
)aaxyz 
WHERE flag = 'Y'
)A

LEFT OUTER JOIN 
tablee.B
ON A.conn_keyy = B.conn_keyy

LEFT OUTER JOIN (SELECT disTINCT * FROM tablee.cc) CPLCUR
ON CPLCUR.conn_keyy = A.conn_keyy
AND CPLCUR.cllt = REGEXP_EXTRACT(B.rfbbn,1)
AND CPLCUR.dtdt = '1999' 

LEFT OUTER JOIN (SELECT disTINCT * FROM tablee.dd) CPLBAL
ON CPLBAL.conn_keyy = A.conn_keyy
AND CPLBAL.SEQUENCE = CPLCUR.SEQUENCE
AND CPLBAL.dtdt = '1999' 

LEFT OUTER JOIN (SELECT disTINCT * FROM tablee.ee) CPLCCY
ON CPLCCY.conn_keyy = A.conn_keyy
AND CPLCCY.SEQUENCE = CPLCUR.SEQUENCE
AND CPLCCY.dtdt = '1999' 

LEFT OUTER JOIN (SELECT disTINCT * FROM tablee.ff) CPLMOV
ON CPLMOV.conn_keyy = A.conn_keyy
AND CPLMOV.SEQUENCE = CPLCUR.SEQUENCE
AND CPLMOV.dtdt = '1999' 

LEFT OUTER JOIN
 (tablee.REP)REP 
ON REP.relino = B.lnido

LEFT OUTER JOIN tablee.P PRD
ON PRD.PRODUCT_CODE = REGEXP_EXTRACT(A.conn_keyy,'[.]([^.]+)',1)
AND PRD.dtdt = '1999'

WHERE B.lnido LIKE 'PLCONS1%'
) rrvv;

仅供参考,A 的选择计数 (*) 约为 60,000

我只是想知道为什么我的查询结果变成了 15 亿.. 我错过了什么?操作left-outer-join时出现了什么问题?

解决方法

Join 可以重复行如果连接键在第二个表中不唯一,并且如果连接键在两个表中都不唯一,则会产生更多重复。

例如:

with 

A as (
select 1 key,'one' name
union all
select 1 key,'two' name
),B as (
select 1 key,'two' name
)

select *
  from A left join B on A.key=B.key

结果为四行,每个表只包含两行:

a.key   a.name  b.key   b.name
1       one     1       one
1       one     1       two
1       two     1       one
1       two     1       two

如何找到重复的密钥:

select B.conn_keyy,count(*) cnt 
  from tablee.B
group by B.conn_keyy
 having count(*)>1
order by cnt desc limit 100;

检查您要连接的每个表并决定您可以做什么:应用过滤、区分或添加更多连接键以进行连接(一对一或零)或(多对一或零)