问题描述
为了抓取binance.com,我使用了这个库 pyppeteer 来呈现网页并获得干净的 html 代码而不是 javascript 代码。
我的问题是:会话第一次在远程 Ubuntu 20.04 服务器上正常工作,但是当我再次运行代码时,我得到 pyppeteer.errors.PageError: Page crashed! 或 pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 100000 ms exceeded. 此外,当我从主 Windows 系统在 PyCharm 中运行该代码时,该代码工作正常,但问题恰恰发生在 ubuntu 上。
我认为问题与无人认领的 pyppeteer 会话有关,但我不确定。
这是我的代码:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import time
from datetime import datetime
from sql import *
if __name__ == "__main__":
while True:
session = HTMLSession()
r = session.get('https://www.binance.com/ru/Trade/ETH_BTC')
r.html.render(sleep = 1,keep_page=True,scrolldown=1,timeout=1000)
soup = BeautifulSoup(r.html.html,"lxml")
price = soup.find("div",class_ = lambda value: value and value.startswith("showPrice"))
Now = datetime.Now()
dt_string = Now.strftime("%d/%m/%Y %H:%M:%s")
sql(dt_string,price.text)
print(dt_string + " ETH/BTC: " + price.text)
r.close()
session.close()
这是崩溃错误日志:
Traceback (most recent call last):
File "binance.py",line 13,in <module>
r.html.render(sleep = 1,timeout=1000)
File "/usr/local/lib/python3.8/dist-packages/requests_html.py",line 598,in render
content,result,page = self.session.loop.run_until_complete(self._async_render(url=self.url,script=script,sleep=sleep,wait=wait,content=self.html,reload=reload,scrolldown=scrolldown,timeout=timeout,keep_page=keep_page))
File "/usr/lib/python3.8/asyncio/base_events.py",line 616,in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/requests_html.py",line 512,in _async_render
await page.goto(url,options={'timeout': int(timeout * 1000)})
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py",line 885,in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 1000000 ms exceeded.
[E:pyppeteer.connection] connection unexpectedly closed
Task exception was never retrieved
future: <Task finished name='Task-105' coro=<Connection._async_send() done,defined at /usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 827,in transfer_data
message = await self.read_message()
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 895,in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 971,in read_data_frame
frame = await self.read_frame(max_size)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 1047,in read_frame
frame = await Frame.read(
File "/usr/local/lib/python3.8/dist-packages/websockets/framing.py",line 105,in read
data = await reader(2)
File "/usr/lib/python3.8/asyncio/streams.py",line 721,in readexactly
raise exceptions.IncompleteReadError(incomplete,n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 73,in _async_send
await self.connection.send(msg)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 555,in send
await self.ensure_open()
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py",line 803,in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]),no reason
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 79,in _async_send
await self.dispose()
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 170,in dispose
await self._on_close()
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 151,in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<Connection._recv_loop() done,defined at /usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py:53> exception=PageError('Page crashed!')>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 61,in _recv_loop
await self._on_message(resp)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 143,in _on_message
self._on_query(msg)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 123,in _on_query
session._on_message(params.get('message'))
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py",line 276,in _on_message
self.emit(obj.get('method'),obj.get('params'))
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py",line 108,in emit
handled = self._call_handlers(event,args,kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py",line 91,in _call_handlers
self._emit_run(f,kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyee/_compat.py",line 49,in _emit_run
coro = f(*args,**kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py",line 205,in <lambda>
lambda event: self._onTargetCrashed())
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py",line 228,in _onTargetCrashed
self.emit('error',PageError('Page crashed!'))
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py",line 111,in emit
self._emit_handle_potential_error(event,args[0] if args else None)
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py",line 83,in _emit_handle_potential_error
raise error
pyppeteer.errors.PageError: Page crashed!
解决方法
如果您 KILL 一个进程,例如使用 killall -9 python3,它在第二次、第三次等情况下再次运行良好。任务解决了!