配置单元查询中的串联

问题描述

我有一个蜂巢表

col1   col2
1     ["apple","orange"]
1     ["orange","banana"]
1     ["mango"]
2     ["apple"]
2     ["apple","orange"]

有数据类型

col1 int
col2 array<string>

我想查询类似的内容:

select col1,concat(col2) from table group by col1;

输出应为:

1    ["apple","orange","banana","mango"]
2    ["apple","orange"]

蜂巢中有执行此操作的功能吗?

我也将此数据写入csv,当我将其作为数据帧读取时,我得到的col2 dtype为object。有没有办法将其输出为array

解决方法

尝试展开数组,然后通过按collect_set分组使用 col1 函数。

Example:

Input:

select * from table;
OK
dd.col1 dd.col2
1       ["apple","orange"]
1       ["mango"]
1       ["orange","banana"]

select col1,collect_set(tt1)col2 from (
   select * from table lateral view explode(col2) tt as tt1
)cc 
group by col1;

Output:

col1    col2
1       ["apple","orange","mango","banana"]

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...