问题描述
我目前正在运行 Scrapy v2.5,我想运行无限循环。我的代码:
class main():
def bucle(self,array_spyder,process):
mongo = mongodb(setting)
for spider_name in array_spider:
process_init.crawl(spider_name,params={ "mongo": mongo,"spider_name": spider_name})
process.start()
process.stop()
mongo.close_mongo()
if __name__ == "__main__":
setting = get_project_settings()
while True:
process = CrawlerProcess(setting)
array_spider = process.spider_loader.list()
class_main = main()
class_main.bucle(array_spider,process)
但这导致了如下错误信息:
Traceback (most recent call last):
File "run_scrapy.py",line 92,in <module>
process.start()
File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py",line 327,in start
reactor.run(installSignalHandlers=False) # blocking call
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 1422,in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 1404,in startRunning
ReactorBase.startRunning(cast(ReactorBase,self))
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py",line 843,in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
有人可以帮我吗??
解决方法
AFAIK 没有简单的方法来重新启动蜘蛛,但有一个替代方案 - 蜘蛛永远不会关闭。为此,您可以使用 spider_idle
signal.
根据文档:
Sent when a spider has gone idle,which means the spider has no further:
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline
您还可以在官方 documentation 中找到使用 Signals
的示例。