正则表达式 – 在蜂巢中正则表达式

我正在Hive中学习简单的正则表达式.我正在按照教程和简单的hql语句获取错误？

select REGEXP_EXTRACT( 'Hello,my name is Ben. Please visit','Ben' )

这是我收到的错误消息：

错误的参数”Ben”：org.apache.hadoop.hive.ql.Metadata.HiveException：无法执行方法public java.lang.String org.apache.hadoop.hive.ql.udf.UDFregexpExtract.evaluate(java.类org.apache.hadoop.hive.ql.udf.UDFregexpExtract的对象org.apache.hadoop.hive.ql.udf.UDFregexpExtract@ec0c06f上的lang.String,java.lang.String)参数{Hello,我的名字是本.请访问：大小为2的java.lang.String,Ben：java.lang.String}

它适用于其他语言,但我想在Hive中学习它.任何帮助,将不胜感激.

解决方法

您必须提供第三个参数,即要提取的组索引.

要提取完整匹配,请使用0：

select REGEXP_EXTRACT( 'Hello,'Ben',0)

要提取捕获组值,请使用组索引,例如

select REGEXP_EXTRACT( 'Hello,'name is (\\w+)',1)

将提取本.

见this reference：

regexp_extract(string subject,string pattern,int index)
Returns the string extracted using the pattern. For example,regexp_extract('foothebar','foo(.*?)(bar)',2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace,etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.

正则表达式 – 在蜂巢中正则表达式

解决方法

相关文章