我在从临时配置单元表获取行计数时遇到一些问题。我不确定到底是什么导致了这个错误,因为当我对较小的测试集群运行相同的查询集时,我得到了预期的结果。我只在与一个大的Hive群运行时才会看到这种情况。
代码有点像
with hive.connect() as conn:
conn.execute(f"CREATE TEMPORARY TABLE new_users (uuid String)")
conn.execute(f"""INSERT INTO new_users (uuid)
SELECT uuid FROM big_user_table WHERE <some conditions> """
resp = conn.execute(f"""SELECT COUNT(*) FROM
(SELECT DISTINCT uuid FROM new_users) new_usrs""").fetchone()
我试过几种不同的方法来计算,但实际上是最简单的方法 .fetchone()
那就是抛出错误。
如果有人想要整个hive stacktrace,我可以添加它,但现在这里只有python的一面
File "/home/ec2-user/myproject/report.py", line 88, in run_metrics
(SELECT DISTINCT uuid FROM new_users) new_usrs""").fetchone()
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1276, in fetchone
e, None, None, self.cursor, self.context
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 128, in reraise
raise value.with_traceback(tb)
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1268, in fetchone
row = self._fetchone_impl()
File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1148, in _fetchone_impl
return self.cursor.fetchone()
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/common.py", line 105, in fetchone
self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/common.py", line 45, in _fetch_while
self._fetch_more()
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/hive.py", line 387, in _fetch_more
_check_status(response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pyhive/hive.py", line 495, in _check_status
raise OperationalError(response)
最后一个配置单元错误说明了eof过早 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], sqlState=None, errorCode=0, errorMessage='java.io.IOException: java.io.EOFException: Premature EOF from inputStream'), hasMoreRows=None, results=None)
考虑到在这个计数之前的大量select/insert查询,我很难相信这是内存问题,但目前我也没有其他想法。
谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!