在使用Channel实现WebSockets之后，Scrapy Spider无法在Django上运行无法从异步上下文调用它

问题描述

我要提出一个新问题，因为我在Django应用程序中遇到了Scrapy和Channels的问题，如果有人可以向正确的方向指导我，我将不胜感激。

我使用频道的原因是因为我想从Scrapyd API实时检索爬网状态，而不必一直使用setIntervals，因为这应该成为一种SaaS服务，可能被许多用户使用。

如果运行，我已经正确实现了渠道：

python manage.py runserver

我可以正确地看到系统现在正在使用Asgi：

System check identified no issues (0 silenced).
September 01,2020 - 15:12:33
Django version 3.0.7,using settings 'SEOtoolkit.settings'
Starting Asgi/Channels version 2.4.0 development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

此外，客户端和服务器通过WebSocket正确连接：

WebSocket HANDSHAKING /crawler/22/ [127.0.0.1:50264]
connected {'type': 'websocket.connect'}
WebSocket CONNECT /crawler/22/ [127.0.0.1:50264]

到目前为止，当我通过Scrapyd-API运行scrapy时，问题就来了

2020-09-01 15:31:25 [scrapy.core.scraper] ERROR: Error processing {'url': 'https://www.example.com'}
raceback (most recent call last):
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/twisted/internet/defer.py",line 654,in _runcallbacks
    current.result = callback(current.result,*args,**kw)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/scrapy/utils/defer.py",line 157,in f
    return deferred_from_coro(coro_f(*coro_args,**coro_kwargs))
  File "/private/var/folders/qz/ytk7wml54zd6RSSxygt512hc0000gn/T/crawler-1597767314-spxv81dy.egg/webspider/pipelines.py",line 67,in process_item
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/manager.py",line 82,in manager_method
    return getattr(self.get_queryset(),name)(*args,**kwargs)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py",line 411,in get
    num = len(clone)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py",line 258,in __len__
    self._fetch_all()
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py",line 1261,in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py",line 57,in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch,chunk_size=self.chunk_size)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/sql/compiler.py",line 1150,in execute_sql
    cursor = self.connection.cursor()
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/utils/asyncio.py",line 24,in inner
    raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.

我认为错误消息非常清楚：您不能从异步上下文中调用它-使用线程或sync_to_async =我猜想通过启用Asgi与Scrapy库存在冲突，导致其无法正常工作。

不幸的是，我无法理解其背后的原因，也无法在建议的地方使用“线程或sync_to_async”。

请注意，WebSockets仅用于检查爬网状态，而没有其他内容。

任何人都可以尝试向我解释这种不兼容的原因，并给我一些如何克服这一障碍的提示吗？我花了很多时间寻找答案，但是找不到答案。

非常感谢。

解决方法

您只需转到 pipelines.py 文件即可解决此错误。从 asgiref.sync 导入 sync_to_async。

from asgiref.sync import sync_to_async

导入sync_to_async后，您需要将其用作用于将数据存储到数据库的函数的装饰器。

例如

from itemadapter import ItemAdapter
from crawler.models import Movie
from asgiref.sync import sync_to_async


class MovieSpiderPipeline:
    @sync_to_async
    def process_item(self,item,spider):
        movie = Movie(**item)
        movie.save()
        return item

django django-channels scrapy scrapyd websocket