蜂巢中的正则表达式中有多个匹配项

问题描述

我跑步时

PartNo

我得到了select regexp_extract("hosts: 192.168.1.1 192.168.1.2 host",'((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)',0);
但是我想要的是192.168.1.1192.168.1.1,192.168.1.2

我该怎么办,更改reg或创建UDF?

解决方法

拆分字符串,爆炸,检查每个部分是否进行正则表达式匹配,收集匹配部分的数组,如果需要从数组中获取字符串,请使用concat_ws()来连接数组:

with your_data as(
select stack(1,'hosts: 192.168.1.1 192.168.1.2 host' ) as hosts
)

select collect_set(case when part rlike '((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)' 
                        then part 
                  else null end )
from your_data d
     lateral view explode(split(hosts,' +')) s as part;

结果:

["192.168.1.1","192.168.1.2"]