问题描述
我的用例如下:
我想将数据从表A复制到表B,并将field1 从结构数组复制到字符串数组,其中字符串是表A中struct的val1属性,而忽略val2
Table-A:
field1: array<struct<val1: str,val2: int>>
sample data:
[{val1: "abc",val2: 123},{val1: "def",val2: 456}],[{val1: "xyz",val2: 789}]
Table-B:
field1: array<string>
sample data:
["abc","def"],["xyz"]
我无法弄清楚如何通过蜂巢查询语言选择要转换的field1列。
我能弄清的事情是,我可以分解数组,执行val1的选择,然后执行collect_list,但是尝试多次后仍无法获得正确的语法。
我的查询就像:
select collect_list(select col.val1
from explode(field1) as col) from table-A
我也想严格通过hiveql而不是通过python中的udf来做到这一点。
谢谢。
解决方法
使用Lateral View + explode分解原始数组,并使用collect_set()或collect_list()收集struct.val1数组:
with mydata as (--This is your data example,use your table instead of this CTE
select stack (2,array(named_struct("val1","abc","val2",123),named_struct("val1","def",456)),"xyz",789))
) as myarray
)
select t.myarray as original_array,collect_set(s.val1) as result_array
from mydata t
lateral view explode(myarray) e as s --struct
group by t.myarray
结果:
original_array result_array
[{"val1":"abc","val2":123},{"val1":"def","val2":456}] ["abc","def"]
[{"val1":"xyz","val2":789}] ["xyz"]
您的结构也可以声明为map,而不是struct。在这种情况下,请使用s['val1']
而不是s.val1
来获取地图元素。
在处理数组方面有一些神奇之处可以让您做到这一点:
select t.myarray as original_array,t.myarray.val1 from mydata t
即从结构数组中选择结构字段 val1
将返回 val1
来自http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html