问题描述
我的蜂巢表结构如下
+---------------+--------------+----------------------+
| column_value | metric_name | key |
+---------------+--------------+----------------------+
| A37B | Mean | {0:"202006",1:"1"} |
| ACCOUNT_ID | Mean | {0:"202006",1:"2"} |
| ANB_200 | Mean | {0:"202006",1:"3"} |
| ANB_201 | Mean | {0:"202006",1:"4"} |
| AS82_RE | Mean | {0:"202006",1:"5"} |
| ATTR001 | Mean | {0:"202007",1:"2"} |
| ATTR001_RE | Mean | {0:"202007",1:"3"} |
| ATTR002 | Mean | {0:"202007",1:"4"} |
| ATTR002_RE | Mean | {0:"202007",1:"5"} |
| ATTR003 | Mean | {0:"202008",1:"3"} |
| ATTR004 | Mean | {0:"202008",1:"4"} |
| ATTR005 | Mean | {0:"202008",1:"5"} |
| ATTR006 | Mean | {0:"202009",1:"4"} |
| ATTR006 | Mean | {0:"202009",1:"5"} |
我需要编写一个Spark SQL查询,以基于具有NOT IN条件且两个键都被合并的Key列进行过滤。
以下查询在Beeline的HiveQL中工作正常
select * from your_data where key[0] between '202006' and '202009' and key NOT IN ( map(0,"202009",1,"5") );
由于数据类型不匹配而无法解析:map
在org.apache.spark.sql.catalyst.analysis.package $ AnalysisErrorAt.failAnalysis(package.scala:42) 在org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 3.applyOrElse(CheckAnalysis.scala:115) 在org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 3.applyOrElse(CheckAnalysis.scala:107) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:278) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:278) 在org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:70) 在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:275) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:275) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:326) 在org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) 在org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) 在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:275) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:275) 在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:326)
请帮助!
解决方法
我从之前提出的其他问题中得到了答案。该查询工作正常
select * from your_data where key[0] between 202006 and 202009 and NOT (key[0]="202009" and key[1]="5" );