Hive 合并查询 - 评估 cardinality_violation(_col0,_col1) 时出错

问题描述

我正在尝试运行 Hive 查询。它失败并出现以下错误

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.Metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":{"transactionid":0,"bucketid":-1,"rowid":1},"_col1":"2020-10-28"},"value":{"_col0":1}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: org.apache.hadoop.hive.ql.Metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":{"transactionid":0,"value":{"_col0":1}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
        ... 7 more
Caused by: org.apache.hadoop.hive.ql.Metadata.HiveException: Error evaluating cardinality_violation(_col0,_col1)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:86)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
        at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1022)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:827)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:701)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:767)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
        ... 7 more
Caused by: java.lang.RuntimeException: Cardinality Violation in Merge statement: [0,-1,1],2020-10-12
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFCardinalityViolation.evaluate(GenericUDFCardinalityViolation.java:56)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
        at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
        at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81)
        ... 15 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

下面是查询

MERGE INTO TABLE1 A
using  (select * from TABLE2) B
ON
LOWER(TRIM(A.A)) = LOWER(TRIM(B.A)) AND
LOWER(TRIM(A.B)) = LOWER(TRIM(B.B))
WHEN MATCHED AND LOWER(TRIM(A.C)) = LOWER(TRIM(B.C))  OR TRIM(A.D)= TRIM(B.D)
THEN
UPDATE SET
A= regexp_replace(A,"[^ ']","#"),B= regexp_replace(B,"[^@.]",C= regexp_replace(C,"[^.-]",D= regexp_replace(D,E= regexp_replace(E,"#" ),F= regexp_replace(F,"[^ .+-]",G= regexp_replace(G,H= regexp_replace(H,I= regexp_replace(I,J= regexp_replace(J,K= regexp_replace(K,L= regexp_replace(L,M= regexp_replace(M,N= regexp_replace(N,O= regexp_replace(O,P= regexp_replace(P,Q= regexp_replace(Q,R= regexp_replace(R,S= regexp_replace(S,T= regexp_replace(T,"[^ +-.]","#");

尝试切换基数,但由于绑定异常数组而失败。

请务必分享有任何信息或解决方案的知识或见解。

检查了几个overstack,没有找到与此问题相关的任何线索。

提前致谢

解决方法

切换基数检查 (hive.merge.cardinality.check=false) 会导致一些数据损坏,如果它可以正常工作。

检查您的数据并解决问题。问题是来自 TABLE2 的超过 1 行与 TABLE1 中的同一行匹配。它可能是连接键中的重复,您可以使用 row_number 过滤器或 distinct 等进行修复,或者修复您的 ON 子句,添加更多键使其独一无二。