问题描述
https://pcb.inc.hp.com/webapp/#/nl-nl/contents/33128146?type=I&hierarchy=F&status=L&status=O
我正在使用scrapy,通常都可以解决,但我目前无法使用请求或scrapy 或任何其他模块获取此页面的html。有人知道会出什么问题吗?
解决方法
某些网站使用 Javascript 动态加载数据。
对于这些情况,我们使用 ScrapySplash
,它使用无头浏览器为您加载。
检查文档here
,该网站使用 AngularJS 在加载时动态生成内容。您不能直接从本网站抓取内容,我建议您使用 Selenium
之类的东西和 Python 来抓取数据。
或者相反,根据您的需要,您可以查看 Network
中的 Chrome Dev Tools
标签以查看发出的请求,并从这些 URL 中抓取数据。
例如
Request URL: https://pcb.inc.hp.com/api/catalogs/nl-nl/nodes/0/children?status[]=O&status[]=L&hierParadigm=F
Response: {"baseProdname":"ROOT_NODE","oid":0,"level":0,"status":["O","L"],"cultureCode":"nl-nl","children":[{"baseProdname":"Solutions","oid":8176594,"level":1,"status":["L","O"],"cultureCode":"nl-nl"},{"baseProdname":"Scanners/Copiers/Faxes","oid":15179,{"baseProdname":"Software","oid":8133386,{"baseProdname":"Ink/Toner/Paper/Printer Supplies","oid":12771,{"baseProdname":"Laptops and Hybrids","oid":321957,{"baseProdname":"Printers and Multifunction","oid":18972,{"baseProdname":"Point of Sale Systems","oid":7491307,{"baseProdname":"Desktops & Workstations","oid":12454,{"baseProdname":"Monitors","oid":382087,{"baseProdname":"Services","oid":8362107,{"baseProdname":"Accessories","oid":8386448,{"baseProdname":"3D Materials and Consumables","oid":20063457,{"baseProdname":"Handhelds and Calculators","oid":215348,{"baseProdname":"Industries","oid":20008722,"status":["L"],{"baseProdname":"Tablets","oid":5169094,"status":["O"],{"baseProdname":"Projectors","oid":3338965,{"baseProdname":"Digital Cameras and Photo Studios","oid":382085,"cultureCode":"nl-nl"}]}