问题描述
我正在使用urlparse查找唯一的uri
我可以理解为什么没有路径..因为simetimes URL中没有路径(请参见示例)。而是有参数……只是现在整个操作错误。
问题:我怎么能一口气抓住完整的uri子弹,所以uri和params都一样? (所以我们没有这个错误)
谢谢!
# Duplicate crawl protection should be some unique url,uri or other param - unique for the product
if self.continue_if_product_duplicate(urlparse(product_link).path): continue
完全错误是
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/scrapy/utils/defer.py",line 102,in iter_errback
yield next(it)
File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",line 84,in evaluate_iterable
for r in iterable:
File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py",line 29,in process_spider_output
for x in result:
File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
for r in iterable:
File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py",line 339,in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
for r in iterable:
File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py",line 37,in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
for r in iterable:
File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py",line 58,in <genexpr>
return (r for r in result or () if _filter(r))
File "/var/www/html/shirts/scrapy/sohb2bcrawlers/scrapy_app/spiders/spider.py",line 1404,in parse_category_page
if self.continue_if_product_duplicate(urlparse(product_link).path): continue
File "/usr/lib64/python3.6/urllib/parse.py",line 367,in urlparse
url,scheme,_coerce_result = _coerce_args(url,scheme)
File "/usr/lib64/python3.6/urllib/parse.py",line 123,in _coerce_args
return _decode_args(args) + (_encode_result,)
File "/usr/lib64/python3.6/urllib/parse.py",line 107,in _decode_args
return tuple(x.decode(encoding,errors) if x else '' for x in args)
File "/usr/lib64/python3.6/urllib/parse.py",in <genexpr>
return tuple(x.decode(encoding,errors) if x else '' for x in args)
AttributeError: 'Selector' object has no attribute 'decode'
网址为http://nos.emanuelberg.com/index.PHP?id_category=23&controller=category&id_lang=3
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)