问题描述
我正在尝试使用scrapy-splash从网站dermstore.com抓取一些数据
首先,我尝试访问dermstore.com中不同品牌的所有链接URL
size-allocate
我正在尝试将href网址抓取到不同的品牌并输出test.json,但仅输出空白json
我的蜘蛛码:
url = 'https://www.dermstore.com/all_Brands_100.htm'
控制台输出:
import scrapy
from scrapy_splash import SplashRequest
from scrapy.utils.response import open_in_browser
from scrapy.http.response.html import HtmlResponse
class SpiderdermSpider(scrapy.Spider):
name = 'spiderDerm'
script = """
function main(splash)
splash:init_cookies(splash.args.cookies)
assert(splash:go(splash.args.url))
splash:wait(0.5)
local element = splash:select('li.next a')
local bounds = element:bounds()
element:mouse_click{x=bounds.width/2,y=bounds.height/2}
assert(splash:wait(5.0))
return {
cookies = splash:get_cookies(),html = splash:html(),url = splash:url()
}
end
"""
url = 'https://www.dermstore.com/all_Brands_100.htm'
def start_requests(self):
yield SplashRequest(self.url,callback=self.parse,endpoint='render.html',args={"wait" : 0.5})
def parse(self,response):
#ht = HtmlResponse(url=response.url,body=response.body,encoding="utf-8",request=response.request)
#open_in_browser(ht)
#return None
for brand_url in response.css('a.col-xs-6::attr(href)'):
yield {
'url' : brand_url
}
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)