正则表达式:从数组单元中提取多个URL字符串

问题描述

什么是干净的正则表达式模式,用于匹配第一个逗号停止的URL字符串?尝试从Google表格中的数组数组中提取值

单元格A1

{https://www.myshop.com/shop/the_first_shop,marcus. White's. Shop.,ACTIVE,US};{https://www.myshop.com/shop/a-second-shop,The first! Shop,CLOSED,UK};{EMPTY,ClosedShop,IN}

所需的输出(单元格B1)

https://www.myshop.com/shop/the_first_shop,https://www.myshop.com/shop/a-second-shop

我想出了如何使用以下方法在所需的输出单元格中获得匹配值的干净数组:

=trim(regexreplace(regexreplace(regexreplace(REGEXREPLACE(A2,"/(https?:\/\/[^ ]*)/"," "),";","}","{"," "))

但是我找不到以逗号结尾的正则表达式模式。例如,此状态:

"/(https?:\/\/[^ ]*)/" 

匹配第一个URL,但还给我:

https://www.myshop.com/shop/the_first_shop,US https://www.myshop.com/shop/a-second-shop,UK EMPTY,IN

解决方法

我会和REGEXREPLACE一起使用:

=REGEXREPLACE(A1,".*?(?:(https.*?)|$)","$1")

只是尾随逗号来处理...

=REGEXREPLACE(REGEXREPLACE(A1,".*?(?:(https.*?(,))|$)","$1"),",$","")

REGEXREPLACE的更长替代方案可能是:

=TEXTJOIN(",QUERY(TRANSPOSE(SPLIT(SUBSTITUTE(SUBSTITUTE(A1,"{","}"),"}","),")),"Select Col1 where Col1 like 'http%'"))
,

以逗号结尾的正则表达式模式

=REGEXEXTRACT(A1,"(https?:\/\/[^,]*)")

enter image description here