java.lang.ArrayIndexOutOfBoundsException: -1 - Hive 更新语句

问题描述

当我尝试运行 Hive 更新语句时出现以下错误

    2021-02-25 15:38:54,934 INFO [Asyncdispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1592334694783_33388_r_000007_3: Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.Metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":3}},"value":{"_col0":"T","_col1":1111111,"......."_col44":""}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1866)
    Caused by: java.lang.Arrayindexoutofboundsexception: -1
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:790)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)

更新查询很简单。

Target 表中的所有列都是 string 或 Decimal 。

发现另一个问题点 Cloudera Link,但问题是此查询大部分时间都在运行,但在针对某些类型的数据运行时会失败。

更新声明

UPDATE Table1 a
SET
email = MaskData(email,1)
WHERE  d_Date >= '2017-01-01' and
email IN (select distinct email from Table2);

任何前进的道路或帮助都会有所帮助。提前致谢。

解决方法

当我们从 Spark 插入数据时,数据似乎没有正确分桶。 不得不重新制作完整的表格,它工作正常。