问题描述
这是我要抓取的网站 https://people.sap.com/tim.sheppard
特别是我试图抓取第一篇文章,它在检查窗口中有这个位置:
<div class="dm-content-item__text">We have migrated an application from PB 7 to PB 12.5.2 build 5006. After the migration we're having problems with some computed fields in datawindows. The app has many datawindows that have computed fields which include a date() function using Syntax...</div>
我的蜘蛛如下
import scrapy
from scrapy_splash import SplashRequest
class RedditSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['people.sap.com']
start_urls = ['http://people.sap.com/tim.sheppard']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,callback=self.parse,endpoint='render.html')
def parse(self,response):
quote = response.xpath('//*[@class="dm-content-item__text"]/text()').extract()
yield {"quote": quote}
这是我得到的回应
2021-06-10 17:33:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://people.sap.com/tim.sheppard via http://localhost:8050/render.html> (referer: None)
2021-06-10 17:33:22 [scrapy.core.scraper] DEBUG: Scraped from <200 http://people.sap.com/tim.sheppard>
{'quote': []}
2021-06-10 17:33:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-06-10 17:33:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
我不明白我做错了什么...
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)