问题描述
当我尝试运行 Hive 更新语句时出现以下错误。
2021-02-25 15:38:54,934 INFO [Asyncdispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1592334694783_33388_r_000007_3: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.Metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":3}},"value":{"_col0":"T","_col1":1111111,"......."_col44":""}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1866)
Caused by: java.lang.Arrayindexoutofboundsexception: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:790)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
更新查询很简单。
Target 表中的所有列都是 string 或 Decimal 。
发现另一个问题点 Cloudera Link,但问题是此查询大部分时间都在运行,但在针对某些类型的数据运行时会失败。
更新声明
UPDATE Table1 a
SET
email = MaskData(email,1)
WHERE d_Date >= '2017-01-01' and
email IN (select distinct email from Table2);
任何前进的道路或帮助都会有所帮助。提前致谢。
解决方法
当我们从 Spark 插入数据时,数据似乎没有正确分桶。 不得不重新制作完整的表格,它工作正常。