有没有最好的方法来连接多个表

问题描述

有人可以帮忙加入/合并下表吗。

如果部门列(depart_1、depart_2、depart_3)在一张表中,我知道该怎么做。但无法实现此场景,因为它们位于不同的表中。

我有将近 100 个字段,比如部门,所以也不太关心性能

enter image description here

解决方法

通过使用 JOINUNION

SELECT
  id,name gender,1 as seq,depart_1 as department
FROM tab 1
UNION 
SELECT
  id,2 as seq,depart_2 as department
FROM tab 1
UNION
SELECt
  tab1.id,tab2.name,tab1.gender,3 as seq,tab2.depart_3 as department
FROM tab2 JOIN tab1 on tab2.id = tab1.id
UNION
SELECt
  tab1.id,4 as seq,tab2.depart_4 as department
FROM tab2 JOIN tab1 on tab2.id = tab1.id
UNION
SELECT
   tab1.id,tab3.name,5 as seq,tab3.depart_5 as department
FROM tab3 JOIN tab1 on tab3.id = tab1.id
UNION
SELECT
   tab1.id,6 as seq,tab3.depart_6 as department
FROM tab3 JOIN tab1 on tab3.id = tab1.id

每个查询读取一个部门信息。因此,您可以在 seq 列的每个查询中使用静态数字。

,

最好先完成所有联合,然后在最后执行较小的联合。

SELECT tab.id,tab.name,gen.gender,tab.seq,tab.deparment
FROM
    (SELECT id,name,depart_1 as department FROM tab 1
    UNION 
    SELECT id,depart_2 as department FROM tab 1
    UNION
    SELECT id,depart_3 as department FROM tab 2
    UNION 
    SELECT id,depart_4 as department FROM tab 2) tab LEFT JOIN 
    (SELECT DISTINCT id,gender FROM tab1 ) gen ON tab.id=gen.id 

由于您是在 hive 中执行此操作,因此这将自动执行 map-side join,这将使您的查询速度更快。