使用PHP和XMLReader解析XML

我一直在尝试使用 PHP和XMLReader解析一个非常大的XML文件,但似乎无法得到我正在寻找的结果.基本上,我正在搜索大量的信息,如果一个包含某个zipcode,我想返回那一点XML,或继续搜索,直到找到该zipcode.从本质上讲,我将把这个大文件分解成只有几个小块,所以不必查看数千或数百万组信息,它可能是10或20.

这里有一些我喜欢的XML

//search through xml
<lineups country="USA">
//cache TX02217 as a variable
 <headend headendId="TX02217">
//cache Grande Gables at The Terrace as a variable
  <name>Grande Gables at The Terrace</name>
//cache Grande Communications as a variable
  <mso msoId="17541">Grande Communications</mso>
  <marketIds>
   <marketId type="DMA">635</marketId>
  </marketIds>
//check to see if any of the postal codes are equal to $pc variable that will be set in the PHP
  <postalCodes>
   <postalCode>11111</postalCode>
   <postalCode>22222</postalCode>
   <postalCode>33333</postalCode>
   <postalCode>78746</postalCode>
  </postalCodes>
//cache Austin to a variable
  <location>Austin</location>
  <lineup>
//cache all prgSvcID's to an array i.e. 20014,10722
   <station prgSvcId="20014">
//cache all channels to an array i.e. 002,003  
    <chan effDate="2006-01-16" tier="1">002</chan>
   </station>
   <station prgSvcId="10722">
    <chan effDate="2006-01-16" tier="1">003</chan>
   </station>
  </lineup>
  <areasServed>
   <area>
//cache community to a variable $community   
    <community>ThornDale</community>
    <county code="45331" size="D">Milam</county>
//cache state to a variable i.e. TX
    <state>TX</state>
   </area>
   <area>
    <community>Thrall</community>
    <county code="45491" size="B">Williamson</county>
    <state>TX</state>
   </area>
  </areasServed>
 </headend>

//if any of the postal codes matched $pc 
//echo back the xml from <headend> to </headend>

//if none of the postal codes matched $pc
//clear variables and move to next <headend>

 <headend>
 etc
 etc
 etc
 </headend>
 <headend>
 etc
 etc
 etc
 </headend>
 <headend>
 etc
 etc
 etc
 </headend> 
</lineups>

PHP

<?PHP
$pc = "78746";
$xmlfile="myFile.xml";
$reader = new XMLReader();
$reader->open($xmlfile); 

while ($reader->read()) { 
//search to see if groups contain $pc and echo info
}

我知道我正在努力使它变得比它应该更难,但我试图操纵这么大的文件有点不知所措.任何帮助表示赞赏.

解决方法

为了通过XMLReader获得更大的灵活性,我通常创建自己 iterators that are able to work on the XMLReader object and provide the steps I need.

这开始于对所有节点的简单迭代,以及可选地具有特定名称的元素上的迭代.让我们调用最后一个XMLElementIterator,将读取器和元素名称作为参数.

在你的场景中,我将创建一个迭代器,为当前元素返回一个SimpleXMLElement,只取< headend>内容

require('xmlreader-iterators.PHP'); // https://gist.github.com/hakre/5147685

class HeadendIterator extends XMLElementIterator {
    const ELEMENT_NAME = 'headend';

    public function __construct(XMLReader $reader) {
        parent::__construct($reader,self::ELEMENT_NAME);
    }

    /**
     * @return SimpleXMLElement
     */
    public function current() {
        return simplexml_load_string($this->reader->readOuterXml());
    }
}

配备这个迭代器,你的其余工作主要是小菜一碟.首先加载10千兆字节的文件

$pc      = "78746";

$xmlfile = '../data/lineups.xml';
$reader  = new XMLReader();
$reader->open($xmlfile);

然后检查< headend> element包含信息,如果是,则显示数据/ XML:

foreach (new HeadendIterator($reader) as $headend) {
    /* @var $headend SimpleXMLElement */
    if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) {
        continue;
    }

    echo 'Found,name: ',$headend->name,"\n";
    echo "==========================================\n";
    $headend->asXML('PHP://stdout');
}

这确实是你想要实现的:迭代大文档(对内存友好)直到你找到你感兴趣的元素.然后你处理具体元素,它只是XML; XMLReader::readOuterXml()一个很好的工具.

示例输出

Found,name: Grande Gables at The Terrace
==========================================
<?xml version="1.0"?>
<headend headendId="TX02217">
        <name>Grande Gables at The Terrace</name>
        <mso msoId="17541">Grande Communications</mso>
        <marketIds>
            <marketId type="DMA">635</marketId>
        </marketIds>
        <postalCodes>
            <postalCode>11111</postalCode>
            <postalCode>22222</postalCode>
            <postalCode>33333</postalCode>
            <postalCode>78746</postalCode>
        </postalCodes>
        <location>Austin</location>
        <lineup>
            <station prgSvcId="20014">
                <chan effDate="2006-01-16" tier="1">002</chan>
            </station>
            <station prgSvcId="10722">
                <chan effDate="2006-01-16" tier="1">003</chan>
            </station>
        </lineup>
        <areasServed>
            <area>
                <community>ThornDale</community>
                <county code="45331" size="D">Milam</county>
                <state>TX</state>
            </area>
            <area>
                <community>Thrall</community>
                <county code="45491" size="B">Williamson</county>
                <state>TX</state>
            </area>
        </areasServed>
    </headend>

相关文章

统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
前言 之前做了微信登录,所以总结一下微信授权登录并获取用户...
FastAdmin是我第一个接触的后台管理系统框架。FastAdmin是一...
之前公司需要一个内部的通讯软件,就叫我做一个。通讯软件嘛...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...