问题描述
在下面的 Hive 查询中,我还需要从 XML 内容中读取空/空“字符串”标签。现在,XPATH()
列表中只考虑非空的“字符串”标签。
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select Name,Value
from your_data d
lateral view outer explode(XPATH(xmlinfo,'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo,concat('ParentArray/ParentFieldArray[Name="',pf.Name,'"]/Value/string/text()'))) vl as Value;
Name Value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555
解决方法
这里的问题是 XPATH
返回 NodeList,如果它包含空节点,则不包含在列表中。
连接一些字符串(在 XPATH 中):concat(/Value/string/text()," ")
在这里不起作用:
引起:javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException:无法转换 #STRING 到 NodeList!
在 com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)
简单的解决方案是将 <string></string>
和 <string/>
替换为 <string>NULL</string>
,然后您可以将 'NULL' 字符串转换为 null。
演示:
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name,case when value='NULL' then null else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo,'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo,concat('ParentArray/ParentFieldArray[Name="',pf.Name,'"]/Value/string/text()'))) vl as value
结果:
name value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555