问题描述
Hive DBMS;两个表-A和B
表A
prnt_id sub_id ac_nm cost units
unkNown abc01 abc corp 34500 24
unkNown unkNown xyz corp 9800 10
856 abc03 jfk corp 9820 12
表B
prnt_id sub_id ac_nm
123 abc01 abc corp
456 abc02 xyz corp
856 abc03 jfk corp
859 abc04 ops corp
问题->尝试执行查询,其中: 将表A与表B连接起来,首先在prnt_id上连接,如果它是“ unkNown”,然后在sub_id上连接,如果它是“ unkNown”,则在ac_nm上连接。
所需的输出:
prnt_id sub_id ac_nm cost units
123 abc01 abc corp 34500 24
456 abc02 xyz corp 9800 10
856 abc03 jfk corp 9820 12
解决方法
您必须将LEFT
的{{1}}联接到TableB
的3个副本中,并过滤掉不匹配的行:
TableA
请参见demo。
结果:
select b.*,coalesce(a1.cost,a2.cost,a3.cost) cost,coalesce(a1.units,a2.units,a3.units) units
from TableB b
left join TableA a1 on a1.prnt_id = b.prnt_id
left join TableA a2 on a2.sub_id = b.sub_id and a1.prnt_id is null
left join TableA a3 on a3.ac_nm = b.ac_nm and a2.sub_id is null
where coalesce(a1.prnt_id,a2.sub_id,a3.ac_nm) is not null
order by b.prnt_id
,
您可以使用left join
和coalesce()
来选择第一个匹配项:
select a.*,coalesce(b.?,bs.?,ba.?) as new_col
from a left join
b
on b.prnt_id = a.prnt_id left join
b bs
on bs.sub_id = a.sub_id and b.prnt_id is null left join
b ba
on bs.ac_nm = a.ac_num and bs.sub_id is null;