使用FSharp.Data HTML解析器时，为什么我的HTML文档被打乱了？

当尝试使用FSharp.Data库操纵某些HTML时，结果令人困惑。

代码如下：

let manipulateHtml (htmlDoc:HtmlDocument) =
    htmlDoc.Html().Descendants()
    |> filterFromHtml stuffToRemove 
    |> HtmlDocument.New

当我打印生成的HTML文档时，其顺序不正确-似乎是从随机节点开始重建文档。 HtmlDocument.New（seq）如何重建html文档，并且有一种方法可以以正确的格式重建文档-例如它的原始顺序？

这是因为Descendants()方法以递归方式返回所有子级。这意味着返回的序列将包含所有祖父母，父母，子女...节点。

例如，当文档为：

<html>
    <tag1>
        <tag2>
            this is the text
        </tag2>
    </tag1>
</html>

然后Descendants()将返回如下所示的节点序列：

<tag1>
    <tag2>
        this is the text
    </tag2>
</tag1>

<tag2>
    this is the text
</tag2>

this is the text

但是HtmlDocument.New方法以扁平的方式构造文档，因此您将看到上面的文档，其中tag2重复了两次，this is the text重复了3次。

因此，为了解决您的问题，您需要遍历htmlDoc.Html()的树，确定将保留哪个节点，同时使用HtmlNode.New***()和{{1 }}方法。