在 Hive 中,如何使用explode(XPATH(..)) 函数读取XML 中存在的NULL/空标签?

问题描述

在下面的 Hive 查询中,我还需要从 XML 内容中读取空/空“字符串”标签。现在,XPATH() 列表中只考虑非空的“字符串”标签

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>

        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select Name,Value 
  from your_data d
       lateral view outer explode(XPATH(xmlinfo,'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo,concat('ParentArray/ParentFieldArray[Name="',pf.Name,'"]/Value/string/text()'))) vl as Value;

查询的预期输出

Name    Value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

解决方法

这里的问题是 XPATH 返回 NodeList,如果它包含空节点,则不包含在列表中。

连接一些字符串(在 XPATH 中):concat(/Value/string/text()," ") 在这里不起作用:

引起:javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException:无法转换 #STRING 到 NodeList!

在 com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)

简单的解决方案是将 <string></string><string/> 替换为 <string>NULL</string>,然后您可以将 'NULL' 字符串转换为 null。

演示:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select name,case when value='NULL' then null else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo,'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo,concat('ParentArray/ParentFieldArray[Name="',pf.Name,'"]/Value/string/text()'))) vl as value

结果:

name    value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555