使用C#中的HTML Agility Pack从列表<li>标记中提取所有数据

问题描述

我正在使用HTML Agility Pack提取数据。我想从源中提取所有列表项:

<div id="feature-bullets" class="a-section a-spacing-medium a-spacing-top-small">

<ul class="a-unordered-list a-vertical a-spacing-mini">

<li><span class="a-list-item">
some data 1

</span></li>

<li><span class="a-list-item">
some data 2

</span></li>

<li><span class="a-list-item">
some data 3

</span></li>

<li><span class="a-list-item">
some data 4

</span></li>

</ul>

到目前为止,我的代码:

string source = someSource

var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(source);

如何提取所有列表项以获得类似以下结果:

List value 1 is: some data 1
List value 2 is: some data 2
List value 3 is: some data 3
List value 4 is: some data 4

解决方法

以下是我正在使用的来源:amazon.co.uk/dp/B07VD9F419。我正在尝试提取要点中的数据。

安装其他NuGet软件包Fizzler.Systems.HtmlAgilityPack以启用QuerySelector功能。查询语法与JavaScript中的查询语法相同。

请考虑以下示例。

using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;
class Program
{
    private static readonly HttpClient client = new HttpClient();

    static async Task Main(string[] args)
    {
        string source = await client.GetStringAsync("https://www.amazon.co.uk/dp/B07VD9F419");

        var htmlDoc = new HtmlDocument();
        htmlDoc.LoadHtml(source);

        IEnumerable<HtmlNode> nodes = htmlDoc.DocumentNode.QuerySelectorAll("div#feature-bullets ul li span.a-list-item");

        foreach (HtmlNode node in nodes)
        {
            Console.WriteLine(new string('-',20) + Environment.NewLine + node.InnerText.Trim());
        }

        Console.ReadKey();
    }
}

控制台输出

--------------------
In addition to body weight,it also gives you a realistic picture of your health and fitness with 13 data points,such as body composition,muscle volume etc.
--------------------
High precision With a series of algori thms complexes and advanced bioelectric Impedance Analysis (BIA),provides accurate state dose.
--------------------
Weighs from 100g to 150kg so it can also weigh fruits and vegetables in addition to adults and children.
--------------------
Stores up to 16 profiles

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...