在PHP中剥离带有类的标记

问题描述

| 所以我需要剥离类tip的span标签。这样便是<span class=\"tip\">和对应的</span>，以及其中的所有内容... 我怀疑需要一个正则表达式，但是我非常讨厌这个。笑...

<?PHP
$string = \'April 15,2003\';
$pattern = \'/(\\w+) (\\d+),(\\d+)/i\';
$replacement = \'${1}1,$3\';
echo preg_replace($pattern,$replacement,$string);
?>

没有错误...但是

<?PHP
$str = preg_replace(\'<span class=\"tip\">.+</span>\',\"\",\'<span class=\"RSS-title\"></span><span class=\"RSS-link\">linkylink</span><span class=\"RSS-id\"></span><span class=\"RSS-content\"></span><span class=\\\"RSS-newpost\\\"></span>\');
echo $str;
?>

给我错误：

Warning: preg_replace() [function.preg-replace]: UnkNown modifier \'.\' in <A FILE> on line 4

以前，错误发生在第二行的);，但是现在。

解决方法

这是“适当的”方法（改编自此答案）。输入：

<?php
$str = \'<div>lol wut <span class=\"tip\">remove!</span><span>don\\\'t remove!</span></div>\';
?>

码：

<?php
function recurse(&$doc,&$parent) {
   if (!$parent->hasChildNodes())
      return;

   for ($i = 0; $i < $parent->childNodes->length; ) {
      $elm = $parent->childNodes->item($i);
      if ($elm->nodeName == \"span\") {
         $class = $elm->attributes->getNamedItem(\"class\")->nodeValue;
         if (!is_null($class) && $class == \"tip\") {
            $parent->removeChild($elm);
            continue;
         }
      }

      recurse($doc,$elm);
      $i++;
   }
}

// Load in the DOM (remembering that XML requires one root node)
$doc = new DOMDocument();
$doc->loadXML(\"<document>\" . $str . \"</document>\");

// Iterate the DOM
recurse($doc,$doc->documentElement);

// Output the result
foreach ($doc->childNodes->item(0)->childNodes as $node) {
   echo $doc->saveXML($node);
}
?>

输出：

<div>lol wut <span>don\'t remove!</span></div>

,一个简单的正则表达式，例如：

<span class=\"tip\">.+</span>

无法正常工作，问题是，如果在尖端跨度内打开和关闭了另一个跨度，则您的正则表达式将以其终止而不是尖端终止。基于DOM的工具（如注释中链接的工具）将真正提供更可靠的答案。根据我在下面的评论，在PHP中使用正则表达式时需要添加模式定界符。

<?php
$str = preg_replace(\'\\<span class=\"tip\">.+</span>\\\',\"\",\'<span class=\"rss-title\"></span><span class=\"rss-link\">linkylink</span><span class=\"rss-id\"></span><span class=\"rss-content\"></span><span class=\\\"rss-newpost\\\"></span>\');
echo $str;
?>

可能会稍微成功一些。请查看相关功能的文档页面。 ,现在没有正则表达式，也没有繁重的XML解析：

$html = \' ... <span class=\"tip\"> hello <span id=\"x\"> man </span> </span> ... \';
$tag = \'<span class=\"tip\">\';
$tag_close = \'</span>\';
$tag_familly = \'<span\';

$tag_len = strlen($tag);

$p1 = -1;
$p2 = 0;
while ( ($p2!==false)  && (($p1=strpos($html,$tag,$p1+1))!==false) ) {
  // the tag is found,now we will search for its corresponding closing tag
  $level = 1;
  $p2 = $p1;
  $continue = true; 
  while ($continue) {
     $p2 = strpos($html,$tag_close,$p2+1);
     if ($p2===false) {
       // error in the html contents,the analysis cannot continue
       echo \"ERROR in html contents\";
       $continue = false;
       $p2 = false; // will stop the loop
     } else {
       $level = $level -1;
       $x = substr($html,$p1+$tag_len,$p2-$p1-$tag_len);
       $n = substr_count($x,$tag_familly);
       if ($level+$n<=0) $continue = false;
     }
  }
  if ($p2!==false) {
    // delete the couple of tags,the farest first
    $html = substr_replace($html,\'\',$p2,strlen($tag_close));
    $html = substr_replace($html,$p1,$tag_len);
  }
}

剥离标记标记