使用PHP提取字符串的一些XML标记

我有以下功能

function translate($params) {
    $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>';
    $lang = ucfirst(strtolower($params['lang']));
    if (simplexml_load_string($xmldata) === FALSE) {
        return $params['data'];
    } else {
        $langxmlobj = new SimpleXMLElement($xmldata);

        if ($langxmlobj -> $lang) {
            return ($langxmlobj -> $lang);
        } else {
            return $params['data'];
        }
    }
}

哪个适用于以下字符串:

$params['data'] = '<English>Hello</English><french>Bonjour</french>';
$params['lang'] = 'English';
print translate($params);

输出

Hello

但……

当字符串中包含任何其他标记时:

$params['data'] = '<English><h1>Hello</h1></English><french><h1>Bonjour</h1></french>';
$params['lang'] = 'English';

它没有输出任何东西;

我希望它输出

<h1>Hello</h1> or any other tag within the <LanguageQuotes>

拉出我的头发;任何的想法 ?

VERSION2:

当字符串如下时它不起作用:

$data = '<french><li><span class="pull-right">25 GB</span>Espace disque</french><English><li><span class="pull-right">25 GB</span>disk Space</English>
<french><li><span class="pull-right">YES</span>PHP 5, MysqL 5</french><English><li><span class="pull-right">YES</span>PHP 5, MysqL 5</English>
<french><li><span class="pull-right">100</span>Bases de données</french><English><li><span class="pull-right">100</span>Databases</English>
<french><li><span class="pull-right">∞</span>E-Mails</french><English><li><span class="pull-right">∞</span>E-mails</English>';

解决方法:

你的问题有两个部分.

>将带有标签的片段加载到XML文档中
>从XML获取数据

将数据加载到XML中

这里的主要问题是它不是有效的XML片段,而是HTML片段与某些特定标签的混合.幸运的是DOMDocument可以加载(和修复)HTML.认情况下,这不会将数据加载为UTF-8,您需要添加指定编码的元标记.

$data = '<french><li><span class="pull-right">25 GB</span>Espace disque</french><English><li><span class="pull-right">25 GB</span>disk Space</English>
<french><li><span class="pull-right">YES</span>PHP 5, MysqL 5</french><English><li><span class="pull-right">YES</span>PHP 5, MysqL 5</English>
<french><li><span class="pull-right">100</span>Bases de données</french><English><li><span class="pull-right">100</span>Databases</English>
<french><li><span class="pull-right">∞</span>E-Mails</french><English><li><span class="pull-right">∞</span>E-mails</English>';    

$html_data = 
  '<head><Meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
   <body>'.$data.'</body>';

libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHtml($html_data);
$dom->formatOutput = TRUE;

echo $dom->saveXml();

输出

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body>
    <french>
      <li><span class="pull-right">25 GB</span>Espace disque</li>
    </french>
    <english>
      <li><span class="pull-right">25 GB</span>disk Space</li>
    </english>
    <french>
      <li><span class="pull-right">YES</span>PHP 5, MysqL 5</li>
    </french>
    <english>
      <li><span class="pull-right">YES</span>PHP 5, MysqL 5</li>
    </english>
    ...
  </body>
</html>

如您所见,它保留语言名称元素,但将所有名称转换为小写.如果它们丢失,它总是添加html和body元素,但这不是问题.

从XML获取数据

现在你有了一个DOM,你可以使用XPath来获取节点.

一种可能性是获取body元素并将其导入SimpleXML:

$xpath = new DOMXpath($dom);
$root = simplexml_import_dom($xpath->evaluate('/html/body')->item(0));
var_dump($root);

输出

object(SimpleXMLElement)#4 (2) {
  ["french"]=>
  array(4) {
    [0]=>
    object(SimpleXMLElement)#3 (1) {
      ["li"]=>
      object(SimpleXMLElement)#12 (1) {
        ["span"]=>
        string(5) "25 GB"
      }
    }
    ...
  }
  ["english"]=>
  array(4) {
    [0]=>
    object(SimpleXMLElement)#5 (1) {
      ["li"]=>
      object(SimpleXMLElement)#12 (1) {
        ["span"]=>
        string(5) "25 GB"
      }
    }
    ...

或直接获取节点并将其保存为HTML片段:

$xpath = new DOMXpath($dom);
$string = '';
foreach ($xpath->evaluate('/html/body/*[name() = "english"]/*') as $node) {
  $string .= $dom->saveHtml($node);
}
echo $string;

输出

<li>
<span class="pull-right">25 GB</span>disk Space</li><li>
<span class="pull-right">YES</span>PHP 5, MysqL 5</li><li>
<span class="pull-right">100</span>Databases</li><li>
<span class="pull-right">∞</span>E-mails</li>

相关文章

php输出xml格式字符串
J2ME Mobile 3D入门教程系列文章之一
XML轻松学习手册
XML入门的常见问题(一)
XML入门的常见问题(三)
XML轻松学习手册(2)XML概念