使用scrapy-splash刮数据的问题

问题描述

我正在尝试使用scrapy-splash从网站dermstore.com抓取一些数据

首先，我尝试访问dermstore.com中不同品牌的所有链接URL

size-allocate

我正在尝试将href网址抓取到不同的品牌并输出test.json，但仅输出空白json

我的蜘蛛码：

url = 'https://www.dermstore.com/all_Brands_100.htm'

控制台输出：

import scrapy
from scrapy_splash import SplashRequest
from scrapy.utils.response import open_in_browser
from scrapy.http.response.html import HtmlResponse

class SpiderdermSpider(scrapy.Spider):
    name = 'spiderDerm'

    script = """
        function main(splash)
        splash:init_cookies(splash.args.cookies)
        assert(splash:go(splash.args.url))
        splash:wait(0.5)
        local element = splash:select('li.next a')
        local bounds = element:bounds()
        element:mouse_click{x=bounds.width/2,y=bounds.height/2}
        assert(splash:wait(5.0))

        return {
            cookies = splash:get_cookies(),html = splash:html(),url = splash:url()
            }
        end
    """
    
    url = 'https://www.dermstore.com/all_Brands_100.htm'

    def start_requests(self):
        yield SplashRequest(self.url,callback=self.parse,endpoint='render.html',args={"wait" : 0.5})

    def parse(self,response):
        #ht = HtmlResponse(url=response.url,body=response.body,encoding="utf-8",request=response.request)
        #open_in_browser(ht)
        #return None
        for brand_url in response.css('a.col-xs-6::attr(href)'):
            yield {
                'url' : brand_url
            }

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

scrapinghub scrapy-splash