从现有的 Hive 表列创建多个列

问题描述

如何从现有的配置单元表创建多个列。示例数据如下所示。

enter image description here

我的要求是仅在条件满足时从现有表中创建 2 个新列。 当代码=1 时为 col1。当代码=2 时为 col2。

预期输出

enter image description here

请帮助如何在 Hive 查询中实现它?

解决方法

如果将所需的值聚合到数组中,则可以仅分解和过滤具有匹配位置的值。

演示:

with 

my_table as (--use your table instead of this CTE
select stack(8,'a',1,'b',2,'c',3,'b1','d',4,'c1','a1','d1',4
) as (col,code)
)

select c1.val as col1,c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,collect_set(case when code=2 then col else null end) as col2 
  from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos,val  
   lateral view outer posexplode(col2) c2 as pos,val
where c1.pos=c2.pos

结果:

col1    col2
a       b
a1      b1

如果数组大小不同,这种方法将不起作用。

另一种方法 - 在 row_number 上计算 row_number 和完全连接,如果 col1 和 col2 具有不同数量的值(某些值将为空),这将起作用:

with 

my_table as (--use your table instead of this CTE
select stack(8,code)
),ordered as
(
select code,col,row_number() over(partition by code order by col) rn
  from my_table where code in (1,2)
)

select c1.col as col1,c2.col as col2
  from (select * from ordered where code=1) c1 
       full join 
       (select * from ordered where code=2) c2 on c1.rn = c2.rn

结果:

col1    col2
a       b
a1      b1