Zeppelin/Hive/Spark 上的笛卡尔积查询

问题描述

我正在尝试使用 Hive 作为解释器（Spark 是 Hive 执行引擎）在 Zeppelin 中运行以下笛卡尔积查询：

SELECT * FROM glue_data_catalog.tabla_x INNER JOIN glue_data_catalog.tabla_y ON 1 = 1

那个加薪：

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: false
Warning: Map Join MAPJOIN[9][bigTable=?] in task 'Stage-1:MAPRED' is a cross product
Query ID = zeppelin_20210512193501_de76f35d-71c5-4410-9ad8-13305e19f59a
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task,with exception 'org.apache.hadoop.hive.ql.Metadata.HiveException(Failed to create spark client.)'
Failed: Execution Error,return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark client.
ExitValue: 1

我确实尝试过这些解决方案但没有成功：

调整服务器连接超时（请参阅 this）。
Hive 找不到 Spark 路径（请参阅 this）。
设置 Spark 主端口（请参阅 this）。
节点的 YARN 管理员（请参阅 this）。
调整每个 Reducer 的字节数（参见 this 或 this）。
参数调整：（参见 this 或 this 或 this）。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

apache-spark apache-zeppelin cartesian-product hive

Zeppelin/Hive/Spark 上的笛卡尔积查询

问题描述

解决方法

相关问答