Scrapy暂停和恢复爬网,结果目录

问题描述

我已经使用恢复模式完成了一个抓取项目。但我不知道结果在哪里。

scrapy爬行somespider -s JOBDIR = crawls / somespider-1

我看着https://docs.scrapy.org/en/latest/topics/jobs.html,但它没有显示任何内容

¿结果所在的文件在哪里?

2020-09-10 23:31:31 [scrapy.core.engine] INFO: Closing spider (finished)
2020-09-10 23:31:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'bans/error/scrapy.core.downloader.handlers.http11.TunnelError': 22,'bans/error/twisted.internet.error.ConnectionRefusedError': 2,'bans/error/twisted.internet.error.TimeoutError': 6891,'bans/error/twisted.web._newclient.ResponseNeverReceived': 8424,'bans/status/500': 9598,'bans/status/503': 56,'downloader/exception_count': 15339,'downloader/exception_type_count/scrapy.core.downloader.handlers.http11.TunnelError': 22,'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 2,'downloader/exception_type_count/twisted.internet.error.TimeoutError': 6891,'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 8424,'downloader/request_bytes': 9530,'downloader/request_count': 172,'downloader/request_method_count/GET': 172,'downloader/response_bytes': 1848,'downloader/response_count': 170,'downloader/response_status_count/200': 169,'downloader/response_status_count/500': 9,'downloader/response_status_count/503': 56,'elapsed_time_seconds': 1717,'finish_reason': 'finished','finish_time': datetime.datetime(2015,9,11,2,31,32),'httperror/response_ignored_count': 67,'httperror/response_ignored_status_count/500': 67,'item_scraped_count': 120,'log_count/DEBUG': 357,'log_count/ERROR': 119,'log_count/INFO': 1764,'log_count/WARNING': 240,'proxies/dead': 1,'proxies/good': 1,'proxies/mean_backoff': 0.0,'proxies/reanimated': 0,'proxies/unchecked': 0,'response_received_count': 169,'retry/count': 1019,'retry/max_reached': 93,'retry/reason_count/500 Internal Server Error': 867,'retry/reason_count/twisted.internet.error.TimeoutError': 80,'retry/reason_count/twisted.web._newclient.ResponseNeverReceived': 72,'scheduler/dequeued': 1722,'scheduler/dequeued/disk': 1722,'scheduler/enqueued': 1722,'scheduler/enqueued/disk': 1722,'start_time': datetime.datetime(2015,48,56,908)}
2020-09-10 23:31:31 [scrapy.core.engine] INFO: Spider closed (finished)

(Face python 3.8) D:\Selenium\Face python 3.8\TORBUSCADORDELINKS\TORBUSCADORDELINKS\spiders>
 'retry/reason_count/500 Internal Server Error': 867,'scheduler/dequeued': 1722673,'start_time': datetime.datetime(2020,908)}
2020-09-10 23:31:31 [scrapy.core.engine] INFO: Spider closed (finished)

解决方法

您的命令,

scrapy crawl somespider -s JOBDIR=crawls/somespider-1

不表示输出文件路径。

因此,您的结果无处可寻。

使用-o命令行开关指定输出路径。

另请参见the Scrapy tutorial,其中涵盖了这一点。或运行scrapy crawl --help