当我尝试使用飞溅抓取内容时,我得到一个空列表,为什么?

问题描述

这是我要抓取的网站 https://people.sap.com/tim.sheppard

特别是我试图抓取第一篇文章,它在检查窗口中有这个位置:

<div class="dm-content-item__text">We have migrated an application from PB 7 to PB 12.5.2 build 5006. After the migration we're having problems with some computed fields in datawindows. The app has many datawindows that have computed fields which include a date() function using Syntax...</div>

我的蜘蛛如下

import scrapy
from scrapy_splash import SplashRequest

class RedditSpider(scrapy.Spider):
    name = 'quotes'
    allowed_domains = ['people.sap.com']
    start_urls = ['http://people.sap.com/tim.sheppard']

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url=url,callback=self.parse,endpoint='render.html')

    def parse(self,response):
        quote = response.xpath('//*[@class="dm-content-item__text"]/text()').extract()
        yield {"quote": quote}

这是我得到的回应

2021-06-10 17:33:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://people.sap.com/tim.sheppard via http://localhost:8050/render.html> (referer: None)
2021-06-10 17:33:22 [scrapy.core.scraper] DEBUG: Scraped from <200 http://people.sap.com/tim.sheppard>
{'quote': []}
2021-06-10 17:33:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-06-10 17:33:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

我不明白我做错了什么...

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)