无法使用请求和bs4抓取数据

问题描述

我编写了一个脚本,该脚本从电子商务网站提取数据,并且我已经使用bs4抓取页面内容并请求提取数据。当我在计算机上本地运行脚本时,一切正常。列出数据需要3-4秒,但是可以。现在,当我在Heroku上部署脚本时,问题就开始了。即使将其推送到Heroku之后,脚本仍可以正常工作,但运行缓慢,并且最令人讨厌的部分是它经常崩溃。因此,它将像6-7次刮擦数据,然后将引发大量错误。作为初学者,我无法从中获得任何收益。这是从Heroku中找到的完整回溯日志:

2020-09-11T18:39:48.896959+00:00 app[worker.1]: Traceback (most recent call last):
2020-09-11T18:39:48.897027+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connection.py",line 159,in _new_conn
2020-09-11T18:39:48.897328+00:00 app[worker.1]: conn = connection.create_connection(
2020-09-11T18:39:48.897333+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/util/connection.py",line 84,in create_connection
2020-09-11T18:39:48.897547+00:00 app[worker.1]: raise err
2020-09-11T18:39:48.897569+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/util/connection.py",line 74,in create_connection
2020-09-11T18:39:48.897793+00:00 app[worker.1]: sock.connect(sa)
2020-09-11T18:39:48.897834+00:00 app[worker.1]: OSError: [Errno 113] No route to host
2020-09-11T18:39:48.897835+00:00 app[worker.1]: 
2020-09-11T18:39:48.897891+00:00 app[worker.1]: During handling of the above exception,another exception occurred:
2020-09-11T18:39:48.897892+00:00 app[worker.1]: 
2020-09-11T18:39:48.897898+00:00 app[worker.1]: Traceback (most recent call last):
2020-09-11T18:39:48.897898+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connectionpool.py",line 670,in urlopen
2020-09-11T18:39:48.898299+00:00 app[worker.1]: httplib_response = self._make_request(
2020-09-11T18:39:48.898322+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connectionpool.py",line 381,in _make_request
2020-09-11T18:39:48.898652+00:00 app[worker.1]: self._validate_conn(conn)
2020-09-11T18:39:48.898672+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connectionpool.py",line 978,in _validate_conn
2020-09-11T18:39:48.899235+00:00 app[worker.1]: conn.connect()
2020-09-11T18:39:48.899238+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connection.py",line 309,in connect
2020-09-11T18:39:48.899483+00:00 app[worker.1]: conn = self._new_conn()
2020-09-11T18:39:48.899488+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connection.py",line 171,in _new_conn
2020-09-11T18:39:48.899630+00:00 app[worker.1]: raise NewConnectionError(
2020-09-11T18:39:48.899656+00:00 app[worker.1]: urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fd5906c0250>: Failed to establish a new connection: [Errno 113] No route to host
2020-09-11T18:39:48.899658+00:00 app[worker.1]: 
2020-09-11T18:39:48.899658+00:00 app[worker.1]: During handling of the above exception,another exception occurred:
2020-09-11T18:39:48.899659+00:00 app[worker.1]: 
2020-09-11T18:39:48.899661+00:00 app[worker.1]: Traceback (most recent call last):
2020-09-11T18:39:48.899678+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/adapters.py",line 439,in send
2020-09-11T18:39:48.899896+00:00 app[worker.1]: resp = conn.urlopen(
2020-09-11T18:39:48.899899+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/connectionpool.py",line 726,in urlopen
2020-09-11T18:39:48.900165+00:00 app[worker.1]: retries = retries.increment(
2020-09-11T18:39:48.900180+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/urllib3/util/retry.py",in increment
2020-09-11T18:39:48.900369+00:00 app[worker.1]: raise MaxRetryError(_pool,url,error or ResponseError(cause))
2020-09-11T18:39:48.900409+00:00 app[worker.1]: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.flipkart.com',port=443): Max retries exceeded with url: /search?q=shoes&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd5906c0250>: Failed to establish a new connection: [Errno 113] No route to host'))
2020-09-11T18:39:48.900411+00:00 app[worker.1]: 
2020-09-11T18:39:48.900411+00:00 app[worker.1]: During handling of the above exception,another exception occurred:
2020-09-11T18:39:48.900412+00:00 app[worker.1]: 
2020-09-11T18:39:48.900412+00:00 app[worker.1]: Traceback (most recent call last):
2020-09-11T18:39:48.900414+00:00 app[worker.1]: File "server.py",line 103,in <module>
2020-09-11T18:39:48.900542+00:00 app[worker.1]: reply= bot.flipkart(product= message_type)
2020-09-11T18:39:48.900567+00:00 app[worker.1]: File "/app/bot.py",line 86,in flipkart
2020-09-11T18:39:48.900823+00:00 app[worker.1]: datas= Test.scrape(product)
2020-09-11T18:39:48.900828+00:00 app[worker.1]: File "/app/Test.py",line 7,in __init__
2020-09-11T18:39:48.901017+00:00 app[worker.1]: self.source= requests.get('https://www.flipkart.com/search?q={}&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'.format(search_query)).content
2020-09-11T18:39:48.901049+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/api.py",line 76,in get
2020-09-11T18:39:48.901257+00:00 app[worker.1]: return request('get',params=params,**kwargs)
2020-09-11T18:39:48.901262+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/api.py",line 61,in request
2020-09-11T18:39:48.901466+00:00 app[worker.1]: return session.request(method=method,url=url,**kwargs)
2020-09-11T18:39:48.901471+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/sessions.py",line 530,in request
2020-09-11T18:39:48.901887+00:00 app[worker.1]: resp = self.send(prep,**send_kwargs)
2020-09-11T18:39:48.901891+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/sessions.py",line 643,in send
2020-09-11T18:39:48.902410+00:00 app[worker.1]: r = adapter.send(request,**kwargs)
2020-09-11T18:39:48.902413+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/requests/adapters.py",line 516,in send
2020-09-11T18:39:48.902823+00:00 app[worker.1]: raise ConnectionError(e,request=request)
2020-09-11T18:39:48.902882+00:00 app[worker.1]: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.flipkart.com',port=443): Max retries exceeded with url: /search?q=shoes&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd5906c0250>: Failed to establish a new connection: [Errno 113] No route to host'))
2020-09-11T18:39:48.991351+00:00 heroku[worker.1]: Process exited with status 1
2020-09-11T18:39:49.047690+00:00 heroku[worker.1]: State changed from up to crashed

我很抱歉没有共享整个代码。我本来可以共享的,但是我已经将两个或三个文件链接在一起了,所以在这里无法共享整个代码。我非常努力,但无法理解错误,因此非常感谢您的帮助!

解决方法

您显示的错误是由于没有互联网或互联网速度慢所致。 尝试检查是否存在正确的互联网(如果无法正常工作),重新启动您当前的python环境