左半连接后数据框方法 count() 和 show() 不起作用Spark/Scala

问题描述

我正在尝试使用 Spark/Scala 实现 NLP-Pipeline。

现在我面临着从另一个集合中减去一个集合(实现为数据帧)的困难 - 两个集合项都有 ID,但两个集合中与该 ID 关联的参数数量不同。

示例:

集合 A 中的条目:"_id" -> "someUniqueID","attribute1" -> "someValue"

集合 B 中的条目:"_id" -> "someUniqueID","attribute1" -> "someValue","attribute2" ->"someValue"

我试图通过使用:

collection_A.join(collection_B,Seq("_id"),jointype="left_semi")

这样做后,我不能使用像

这样的方法
.show()
.count()

但是

.printSchema()

工作,并且产生的结构是所需的。

调用上面提到的任何一个方法都会导致下面列出的错误日志:

Exception in thread "main" java.lang.AbstractMethodError
 at scala.collection.TraversableLike$class.filter(TraversableLike.scala:270)
 at org.apache.spark.sql.catalyst.expressions.ExpressionSet.filter(ExpressionSet.scala:55)
 at org.apache.spark.sql.catalyst.plans.logical.QueryPlanConstraints$class.constraints(QueryPlanConstraints.scala:36)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.constraints$lzycompute(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.constraints(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$.org$apache$spark$sql$catalyst$optimizer$InferFiltersFromConstraints$$getAllConstraints(Optimizer.scala:805)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$$anonfun$inferFilters$1.applyOrElse(Optimizer.scala:780)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$$anonfun$inferFilters$1.applyOrElse(Optimizer.scala:765)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:258)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:258)
 at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:257)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:328)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:326)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:328)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:326)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:328)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:326)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:263)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:247)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$.inferFilters(Optimizer.scala:765)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$.apply(Optimizer.scala:759)
 at org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints$.apply(Optimizer.scala:754)
 at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
 at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
 at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
 at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
 at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
 at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
 at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
 at scala.collection.immutable.List.foreach(List.scala:381)
 at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
 at org.apache.spark.sql.execution.QueryExecution.optimizedplan$lzycompute(QueryExecution.scala:67)
 at org.apache.spark.sql.execution.QueryExecution.optimizedplan(QueryExecution.scala:67)
 at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:73)
 at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:69)
 at org.apache.spark.sql.execution.QueryExecution.executedplan$lzycompute(QueryExecution.scala:78)
 at org.apache.spark.sql.execution.QueryExecution.executedplan(QueryExecution.scala:78)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
 at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
 at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
 at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
 at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
 at org.apache.spark.sql.Dataset.show(Dataset.scala:751)
 at org.apache.spark.sql.Dataset.show(Dataset.scala:710)
 at org.apache.spark.sql.Dataset.show(Dataset.scala:719)
 at App$.main(App.scala:49)
 at App.main(App.scala)

我非常感谢有关此错误的任何帮助/提示

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...