问题描述
import scrapy
class oneplus_spider(scrapy.Spider):
name='one_plus'
page_number=0
start_urls=[
'https://www.amazon.com/s?k=samsung+mobile&page=3&qid=1600763713&ref=sr_pg_3'
]
def parse(self,response):
all_links=[]
total_links=[]
domain='https://www.amazon.com'
href=[]
link_set=set()
href=response.css('a.a-link-normal.a-text-normal').xpath('@href').extract()
for x in href:
link_set.add(domain+x)
for x in link_set:
next_page=x
yield response.follow(next_page,callback=self.parse_page1)
def parse_page1(self,response):
title=response.css('span.a-size-large product-title-word-break::text').extract()
print(title)
运行代码后发生错误-(失败2次):503服务不可用。 我尝试了很多方法,但是失败了。请帮我。预先感谢!
解决方法
首先通过“ curl”检查网址。喜欢,
curl -I "https://www.amazon.com/s?k=samsung+mobile&page=3&qid=1600763713&ref=sr_pg_3"
然后,您会看到503响应。
HTTP/2 503
换句话说,您的请求是错误的。
您必须找到适当的请求。
Chrome DevTools将为您提供帮助。喜欢
我认为必须需要用户代理(如浏览器)。
curl 'https://www.amazon.com/s?k=samsung+mobile&page=3&qid=1600763713&ref=sr_pg_3' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/85.0.4183.102 Safari/537.36' \
--compressed
所以...可能有效,
import scrapy
class oneplus_spider(scrapy.Spider):
name='one_plus'
page_number=0
user_agent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/44.0.2403.157 Safari/537.36"
start_urls=[
'https://www.amazon.com/s?k=samsung+mobile&page=3&qid=1600763713&ref=sr_pg_3'
]
def parse(self,response):
all_links=[]
total_links=[]
domain='https://www.amazon.com'
href=[]
link_set=set()
href=response.css('a.a-link-normal.a-text-normal').xpath('@href').extract()
for x in href:
link_set.add(domain+x)
for x in link_set:
next_page=x
yield response.follow(next_page,callback=self.parse_page1)
def parse_page1(self,response):
title=response.css('span.a-size-large product-title-word-break::text').extract()
print(title)