由于grpc超时,tensorflow-data-validation在具有Apache-Beam Direct流道的大型数据集上不起作用

问题描述

我遇到了直接运行器张量流数据验证问题,无法从一些400GB以上的大型数据集中生成统计信息。 似乎所有员工在发出“ Keepalive看门狗被解雇”错误消息后都停止了工作。关闭运输。”这似乎是 grpc 保持活动超时。

E0804 17:49:07.419950276   44806 chttp2_transport.cc:2881]   ipv6:[::1]:40823: Keepalive watchdog fired. Closing transport.
2020-08-04 17:49:07  local_job_service.py : INFO  Worker: severity: ERROR timestamp {   seconds: 1596563347   nanos: 420487403 } message: "Python sdk harness Failed: \nTraceback (most recent call last):\n  File \"/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py\",line 158,in main\n    sdk_pipeline_options.view_as(ProfilingOptions))).run()\n  File \"/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py\",line 213,in run\n    for work_request in self._control_stub.Control(get_responses()):\n  File \"/home/ec2-user/lib64/python3.7/site-packages/grpc/_channel.py\",line 416,in __next__\n    return self._next()\n  File \"/home/ec2-user/lib64/python3.7/site-packages/grpc/_channel.py\",line 706,in _next\n    raise self\ngrpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"keepalive watchdog timeout\"\n\tdebug_error_string = \"{\"created\":\"@1596563347.420024732\",\"description\":\"Error received from peer ipv6:[::1]:40823\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1055,\"grpc_message\":\"keepalive watchdog timeout\",\"grpc_status\":14}\"\n>" trace: "Traceback (most recent call last):\n  File \"/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py\",\"grpc_status\":14}\"\n>\n" log_location: "/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py:161" thread: "MainThread"
Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py",line 193,in _run_module_as_main
    "__main__",mod_spec)
  File "/usr/lib64/python3.7/runpy.py",line 85,in _run_code
    exec(code,run_globalse
  File "/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",line 248,in <module>
    main(sys.argv)
  File "/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",in main
    sdk_pipeline_options.view_as(ProfilingOptions))).run()
  File "/home/ec2-user/lib64/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",in run
    for work_request in self._control_stub.Control(get_responses()):
  File "/home/ec2-user/lib64/python3.7/site-packages/grpc/_channel.py",in __next__
    return self._next()
  File "/home/ec2-user/lib64/python3.7/site-packages/grpc/_channel.py",in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "keepalive watchdog timeout"
        debug_error_string = "{"created":"@1596563347.420024732","description":"Error received from peer ipv6:[::1]:40823","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"keepalive watchdog timeout","grpc_status":14}"

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)