获取行计数会引发“ inputStream中的EOF过早”错误

问题描述

从临时配置单元表获取行数时遇到了一些麻烦。我不确定导致此错误的真正原因,因为当我对较小的测试集群运行相同的查询集时,我会得到预期的结果。我只有在大型蜂巢群集上运行时才能看到这一点。

代码类似于

with hive.connect() as conn:
    conn.execute(f"CREATE TEMPORARY TABLE new_users (uuid String)")
    conn.execute(f"""INSERT INTO new_users (uuid)
                             SELECT uuid FROM big_user_table WHERE <some conditions> """
    resp = conn.execute(f"""SELECT COUNT(*) FROM
                        (SELECT DISTINCT uuid FROM new_users) new_usrs""").fetchone()

我尝试了一些变体来获取计数,但实际上是.fetchone()引发了错误。

如果有人想要整个配置单元stacktrace,我可以添加它,但现在这里只是python方面

File "/home/ec2-user/myproject/report.py",line 88,in run_metrics
    (SELECT DISTINCT uuid FROM new_users) new_usrs""").fetchone()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py",line 1276,in fetchone
    e,None,self.cursor,self.context
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py",line 1466,in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception,exc_info)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py",line 383,in raise_from_cause
    reraise(type(exception),exception,tb=exc_tb,cause=cause)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py",line 128,in reraise
    raise value.with_traceback(tb)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py",line 1268,in fetchone
    row = self._fetchone_impl()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py",line 1148,in _fetchone_impl
    return self.cursor.fetchone()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/common.py",line 105,in fetchone
    self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/common.py",line 45,in _fetch_while
    self._fetch_more()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/hive.py",line 387,in _fetch_more
    _check_status(response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/hive.py",line 495,in _check_status
    raise OperationalError(response)

最终的配置单元错误指出了有关EOF提前的内容 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'],sqlState=None,errorCode=0,errorMessage='java.io.IOException: java.io.EOFException: Premature EOF from inputStream'),hasMoreRows=None,results=None)

考虑到此COUNT之前的大型SELECT / INSERT查询的数量,我很难相信这是一个内存问题,但目前我还没有其他想法。

谢谢。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...