HTMLAgilityPack无法加载网页的html

问题描述

我正在尝试通过https://www.adecco.ch/en-us/job-results进行爬网,但是我无法从此页面加载html,因为它无法加载html文档中的任何东西。

select sum(case when MyVar = 'Yes' then cnt else 0 end) as yes,sum(case when MyVar = 'Yes' then cnt else 0 end) as no,sum(case when MyVar = 'Yes' then cnt else 0 end) / sum(case when MyVar = 'No' then cnt else 0 end) as ratio
from t;

解决方法

正如我的评论中提到的那样,在尝试加载站点之前,该站点的内容已被压缩回去并且未进行解压缩,因此,您基本上是在加载乱码。这段代码应该可以正常工作-

var handler = new HttpClientHandler();
// this is the important bit
handler.AutomaticDecompression = System.Net.DecompressionMethods.All;
var httpClient = new HttpClient(handler);
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var divs = htmlDocument.DocumentNode.Descendants().ToList();

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...