php – 删除父元素,使用saveHTML保留DOMDocument中的所有内部子元素

我正在使用XPath操作一个简短的HTML片段;当我使用$doc-> saveHTML()输出更改后的代码段时,会添加DOCTYPE,并且HTML / BODY标记会包装输出.我想删除它们,但只使用DOMDocument函数将所有子项保留在内部.例如：

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
echo htmlentities( $doc->saveHTML() );

这会产生：

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...>
<html><body>
<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>
</body></html>

我尝试了一些简单的技巧,例如：

# removes doctype
$doc->removeChild($doc->firstChild);

# <body> replaces <html>
$doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild);

到目前为止,只删除DOCTYPE并用BODY替换HTML.然而,剩下的是身体>此时可变数量的元素.

如何删除< body>鉴于它们的结构可变,并且使用PHP的DOM操作以干净利落的方式保留所有子节点,但保留所有子节点？

解决方法:

UPDATE

这是一个不扩展DOMDocument的版本,虽然我认为扩展是正确的方法,因为您正在尝试实现不是内置于DOM API的功能.

注意：我正在解释“干净”和“没有解决方法”,因为保持对DOM API的所有操作.一旦你点击字符串操作,那就是变通方法领域.

就像在原始答案中一样,我正在做的是利用DOMDocumentFragment来操作所有位于根级别的多个节点.没有字符串操作,我认为这不是一种解决方法.

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild($doc->doctype);

// Remove html element, preserving child nodes
$html = $doc->getElementsByTagName("html")->item(0);
$fragment = $doc->createDocumentFragment();
while ($html->childNodes->length > 0) {
    $fragment->appendChild($html->childNodes->item(0));
}
$html->parentNode->replaceChild($fragment, $html);

// Remove body element, preserving child nodes
$body = $doc->getElementsByTagName("body")->item(0);
$fragment = $doc->createDocumentFragment();
while ($body->childNodes->length > 0) {
    $fragment->appendChild($body->childNodes->item(0));
}
$body->parentNode->replaceChild($fragment, $body);

// Output results
echo htmlentities($doc->saveHTML());

原始答案

这个解决方案相当冗长,但这是因为它通过扩展DOM来实现它,以使您的结束代码尽可能短.

sliceOutNode是神奇发生的地方.如果您有任何疑问,请与我们联系：

<?PHP

class DOMDocumentExtended extends DOMDocument
{
    public function __construct( $version = "1.0", $encoding = "UTF-8" )
    {
        parent::__construct( $version, $encoding );

        $this->registerNodeClass( "DOMElement", "DOMElementExtended" );
    }

    // This method will need to be removed once PHP supports LIBXML_NOXMLDECL
    public function saveXML( DOMNode $node = NULL, $options = 0 )
    {
        $xml = parent::saveXML( $node, $options );

        if( $options & LIBXML_NOXMLDECL )
        {
            $xml = $this->stripXMLDeclaration( $xml );
        }

        return $xml;
    }

    public function stripXMLDeclaration( $xml )
    {
        return preg_replace( "|<\?xml(.+?)\?>[\n\r]?|i", "", $xml );
    }
}

class DOMElementExtended extends DOMElement
{
    public function sliceOutNode()
    {
        $nodeList = new DOMNodeListExtended( $this->childNodes );
        $this->replaceNodeWithNode( $nodeList->toFragment( $this->ownerDocument ) );
    }

    public function replaceNodeWithNode( DOMNode $node )
    {
        return $this->parentNode->replaceChild( $node, $this );
    }
}

class DOMNodeListExtended extends ArrayObject
{
    public function __construct( $mixednodeList )
    {
        parent::__construct( array() );

        $this->setNodeList( $mixednodeList );
    }

    private function setNodeList( $mixednodeList )
    {
        if( $mixednodeList instanceof DOMNodeList )
        {
            $this->exchangeArray( array() );

            foreach( $mixednodeList as $node )
            {
                $this->append( $node );
            }
        }
        elseif( is_array( $mixednodeList ) )
        {
            $this->exchangeArray( $mixednodeList );
        }
        else
        {
            throw new DOMException( "DOMNodeListExtended only supports a DOMNodeList or array as its constructor parameter." );
        }
    }

    public function toFragment( DOMDocument $contextDocument )
    {
        $fragment = $contextDocument->createDocumentFragment();

        foreach( $this as $node )
        {
            $fragment->appendChild( $contextDocument->importNode( $node, true ) );
        }

        return $fragment;
    }

    // Built-in methods of the original DOMNodeList

    public function item( $index )
    {
        return $this->offsetGet( $index );
    }

    public function __get( $name )
    {
        switch( $name )
        {
            case "length":
                return $this->count();
            break;
        }

        return false;
    }
}

// Load HTML/XML using our fancy DOMDocumentExtended class
$doc = new DOMDocumentExtended();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild( $doc->doctype );

// Slice out html node
$html = $doc->getElementsByTagName("html")->item(0);
$html->sliceOutNode();

// Slice out body node
$body = $doc->getElementsByTagName("body")->item(0);
$body->sliceOutNode();

// Pick your poison: XML or HTML output
echo htmlentities( $doc->saveXML( NULL, LIBXML_NOXMLDECL ) );
echo htmlentities( $doc->saveHTML() );

php – 删除父元素,使用saveHTML保留DOMDocument中的所有内部子元素

相关文章