配置单元查询将结构数组转换为字符串数组

问题描述

我的用例如下:

我想将数据从表A复制到表B,并将field1 从结构数组复制到字符串数组,其中字符串是表A中struct的val1属性,而忽略val2

Table-A:
field1: array<struct<val1: str,val2: int>>
sample data:
[{val1: "abc",val2: 123},{val1: "def",val2: 456}],[{val1: "xyz",val2: 789}]

Table-B:
field1: array<string>
sample data:
["abc","def"],["xyz"]

我无法弄清楚如何通过蜂巢查询语言选择要转换的field1列。

我能弄清的事情是,我可以分解数组,执行val1的选择,然后执行collect_list,但是尝试多次后仍无法获得正确的语法。

我的查询就像:

select collect_list(select col.val1 
  from explode(field1) as col) from table-A

我也想严格通过hiveql而不是通过python中的udf来做到这一点。

谢谢。

解决方法

使用Lateral View + explode分解原始数组,并使用collect_set()或collect_list()收集struct.val1数组:

with mydata as (--This is your data example,use your table instead of this CTE
select stack (2,array(named_struct("val1","abc","val2",123),named_struct("val1","def",456)),"xyz",789))
) as myarray
)

select t.myarray as original_array,collect_set(s.val1) as result_array
  from mydata t
       lateral view explode(myarray) e as s --struct
group by t.myarray 

结果:

original_array                                          result_array
[{"val1":"abc","val2":123},{"val1":"def","val2":456}]   ["abc","def"]
[{"val1":"xyz","val2":789}]                             ["xyz"]

您的结构也可以声明为map,而不是struct。在这种情况下,请使用s['val1']而不是s.val1来获取地图元素。

,

在处理数组方面有一些神奇之处可以让您做到这一点:

select t.myarray as original_array,t.myarray.val1 from mydata t

即从结构数组中选择结构字段 val1 将返回 val1

数组

来自http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...