使用urlparseproduct_link.path作为唯一键,但是有时没有路径并且程序因错误而停止-抓取完整的uri slug

问题描述

我正在使用urlparse查找唯一的uri

我可以理解为什么没有路径..因为simetimes URL中没有路径(请参见示例)。而是有参数……只是现在整个操作错误

问题:我怎么能一口气抓住完整的uri子弹,所以uri和params都一样? (所以我们没有这个错误

谢谢!

    # Duplicate crawl protection should be some unique url,uri or other param - unique for the product
    if self.continue_if_product_duplicate(urlparse(product_link).path): continue

完全错误

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/scrapy/utils/defer.py",line 102,in iter_errback
    yield next(it)
  File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",line 84,in evaluate_iterable
    for r in iterable:
  File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py",line 29,in process_spider_output
    for x in result:
  File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
    for r in iterable:
  File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py",line 339,in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
    for r in iterable:
  File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py",line 37,in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib/python3.6/site-packages/scrapy/core/spidermw.py",in evaluate_iterable
    for r in iterable:
  File "/usr/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py",line 58,in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/var/www/html/shirts/scrapy/sohb2bcrawlers/scrapy_app/spiders/spider.py",line 1404,in parse_category_page
    if self.continue_if_product_duplicate(urlparse(product_link).path): continue
  File "/usr/lib64/python3.6/urllib/parse.py",line 367,in urlparse
    url,scheme,_coerce_result = _coerce_args(url,scheme)
  File "/usr/lib64/python3.6/urllib/parse.py",line 123,in _coerce_args
    return _decode_args(args) + (_encode_result,)
  File "/usr/lib64/python3.6/urllib/parse.py",line 107,in _decode_args
    return tuple(x.decode(encoding,errors) if x else '' for x in args)
  File "/usr/lib64/python3.6/urllib/parse.py",in <genexpr>
    return tuple(x.decode(encoding,errors) if x else '' for x in args)
AttributeError: 'Selector' object has no attribute 'decode'

网址为http://nos.emanuelberg.com/index.PHP?id_category=23&controller=category&id_lang=3

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)