在HDP 3.1上以无蜂巢模式在HDP 3.1上执行spark 3.x-找不到蜂巢表

问题描述

如何使用无头(https://spark.apache.org/docs/latest/hadoop-provided.html)版本的spark与hive交互在HDP 3.1上配置Spark 3.x?

首先,我下载并解压缩了无头星火3.x:

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace,existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

注意

这是Spark 3.x和HDP 3.1和How can I run spark in headless mode in my custom version on HDP?custom spark does not find hive databases when running on yarn的更新版本。

此外:我知道spark中的ACID配置单元表存在问题。现在,我只希望能够查看现有数据库

修改

我们必须将蜂巢罐子放在类路径上。尝试如下:

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"

现在使用spark-sql:

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

失败:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

即该行:export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"没有任何效果(如果未设置,则会出现同样的问题)。

解决方法

如上文和custom spark does not find hive databases when running on yarn所述,Hive JAR是必需的。无头版本未提供它们。

我无法改装这些。

解决方案:不用担心:只需在Hadoop 3.2(在HDP 3.1上)上使用spark版本

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...