问题描述
我正在使用HTML Agility Pack提取数据。我想从源中提取所有列表项:
<div id="feature-bullets" class="a-section a-spacing-medium a-spacing-top-small">
<ul class="a-unordered-list a-vertical a-spacing-mini">
<li><span class="a-list-item">
some data 1
</span></li>
<li><span class="a-list-item">
some data 2
</span></li>
<li><span class="a-list-item">
some data 3
</span></li>
<li><span class="a-list-item">
some data 4
</span></li>
</ul>
到目前为止,我的代码:
string source = someSource
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(source);
如何提取所有列表项以获得类似以下结果:
List value 1 is: some data 1
List value 2 is: some data 2
List value 3 is: some data 3
List value 4 is: some data 4
解决方法
以下是我正在使用的来源:
amazon.co.uk/dp/B07VD9F419
。我正在尝试提取要点中的数据。
安装其他NuGet软件包Fizzler.Systems.HtmlAgilityPack
以启用QuerySelector
功能。查询语法与JavaScript中的查询语法相同。
请考虑以下示例。
using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;
class Program
{
private static readonly HttpClient client = new HttpClient();
static async Task Main(string[] args)
{
string source = await client.GetStringAsync("https://www.amazon.co.uk/dp/B07VD9F419");
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(source);
IEnumerable<HtmlNode> nodes = htmlDoc.DocumentNode.QuerySelectorAll("div#feature-bullets ul li span.a-list-item");
foreach (HtmlNode node in nodes)
{
Console.WriteLine(new string('-',20) + Environment.NewLine + node.InnerText.Trim());
}
Console.ReadKey();
}
}
控制台输出
--------------------
In addition to body weight,it also gives you a realistic picture of your health and fitness with 13 data points,such as body composition,muscle volume etc.
--------------------
High precision With a series of algori thms complexes and advanced bioelectric Impedance Analysis (BIA),provides accurate state dose.
--------------------
Weighs from 100g to 150kg so it can also weigh fruits and vegetables in addition to adults and children.
--------------------
Stores up to 16 profiles