增量表:org.apache.spark.sql.catalyst.parser.ParseException:不匹配的输入“FROM”

问题描述

我正在尝试在 EMR/EMR Notebooks (Spark with Scala) 上运行查询 -

SELECT max(version),max(timestamp) FROM (DESCRIBE HISTORY delta.`s3://a/b/c/d`)

但我收到以下错误 -

enter image description here

同样的查询在 Databricks 上运行良好。

我的另一个疑问是 - 为什么 s3 位置的颜色会更改后 //.

enter image description here

所以我试图打破上述查询,只运行 Describe HISTORY 查询。出于某种原因,它说 -

enter image description here

错误日志 -

An error was encountered:
org.apache.spark.sql.AnalysisException: Table or view not found: HISTORY;
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:835)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:787)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:817)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:810)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:810)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:756)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
  at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88)
  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
  at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
  at org.apache.spark.sql.SparkSession.table(SparkSession.scala:630)
  at org.apache.spark.sql.execution.command.DescribeColumnCommand.run(tables.scala:714)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
  at org.apache.spark.sql.execution.sqlExecution$.org$apache$spark$sql$execution$sqlExecution$$executeQuery$1(sqlExecution.scala:83)
  at org.apache.spark.sql.execution.sqlExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(sqlExecution.scala:94)
  at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
  at org.apache.spark.sql.execution.sqlExecution$.org$apache$spark$sql$execution$sqlExecution$$withMetrics(sqlExecution.scala:178)
  at org.apache.spark.sql.execution.sqlExecution$$anonfun$withNewExecutionId$1.apply(sqlExecution.scala:93)
  at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:200)
  at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:92)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:196)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:81)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:644)
  ... 50 elided
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'history' not found in database 'default';
  at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
  at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
  at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
  at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:141)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:98)
  at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:722)
  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:706)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:832)

更新(2021 年 2 月 18 日)-> 到目前为止我所尝试的。

  1. 使用 Spark sql 查询 -

spark.sql("SELECT max(version),max(timestamp) FROM (DESCRIBE HISTORY delta.s3://a/b/c/d)") 但这没有用。同样的错误

  1. 使用 - 创建 Spark 会话

spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtensionspark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

但是它抛出了同样的错误

更新 2(2021 年 2 月 18 日):- 尝试@alex 提到的方法

使用 PySpark。

它部分有效,但不是完全有效。

enter image description here

提前致谢。

解决方法

根据 documentation,要获得对 onActiveChanged 的支持,您需要通过传递 2 个属性来配置 Spark SQL 扩展和目录(请参阅 docs):

  • DESCRIBE HISTORYspark.sql.extensions
  • io.delta.sql.DeltaSparkSessionExtensionspark.sql.catalog.spark_catalog

更新:

对于 Spark 2.4.x,应使用 Delta 0.6.1,其 documentation 具有以下代码片段来激活扩展:

org.apache.spark.sql.delta.catalog.DeltaCatalog