Hive查询失败,出现“无法获取表test_table无效的方法名称:'get_table_req'”,pyspark 3.0.0和Hive 1.1.0

问题描述

在相当新的环境中挖掘POC以获取火花并检查火花功能,但是在pyspark终端中运行SQL查询时出现问题,而Hive正在运行,因为我们可以查询元数据。

您知道这里发生了什么以及如何解决吗?

$ pyspark --driver-class-path /etc/spark2/conf:/etc/hive/conf
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql import Row
>>> spark = SparkSession \
...     .builder \
...     .appName("sample_query_test") \
... .enableHiveSupport() \
...     .getorCreate()
>>> spark.sql("show tables in user_tables").show(5)
20/08/18 19:57:01 WARN conf.HiveConf: HiveConf of name hive.enforce.sorting does not exist
20/08/18 19:57:01 WARN conf.HiveConf: HiveConf of name hive.enforce.bucketing does not exist
20/08/18 19:57:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+-----------+--------------------+-----------+
|   database|           tableName|istemporary|
+-----------+--------------------+-----------+
|user_tables|              a_2019|      false|
|user_tables|abcdefgjeufjdsahh...|      false|
|user_tables|testtesttesttestt...|      false|
|user_tables|newnewnewnewnenwn...|      false|
|user_tables|blahblahblablahbl...|      false|
+-----------+--------------------+-----------+
only showing top 5 rows

>>> spark.sql("select count(*) from user_tables.test_table where date_partition='2020-08-17'").show(5)
Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "/opt/conda/lib/python3.7/site-packages/pyspark/sql/session.py",line 646,in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)
  File "/opt/conda/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",line 1305,in __call__
  File "/opt/conda/lib/python3.7/site-packages/pyspark/sql/utils.py",line 137,in deco
    raise_from(converted)
  File "<string>",line 3,in raise_from
**pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.Metadata.HiveException: Unable to fetch table test_spark_cedatatransfer. Invalid method name: 'get_table_req';**

集群上的信息:

$ hive --version
Hive 1.1.0-cdh5.13.0
Subversion file:///data/jenkins/workspace/generic-package-ubuntu64-16-04/CDH5.13.0-Packaging-Hive-2017-10-04_10-50-44/hive-1.1.0+cdh5.13.0+1269-1.cdh5.13.0.p0.34~xenial -r UnkNown
Compiled by jenkins on Wed Oct 4 11:46:53 PDT 2017

$ pyspark --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0
      /_/

Using Scala version 2.12.10,OpenJDK 64-Bit Server VM,1.8.0_252
Branch HEAD
Compiled by user ubuntu on 2020-06-06T11:32:25Z
Revision 3fdfce3120f307147244e5eaf46d61419a723d50
Url https://gitBox.apache.org/repos/asf/spark.git

$ hadoop version
Hadoop 2.6.0-cdh5.13.0
Subversion http://github.com/cloudera/hadoop -r 42e8860b182e55321bd5f5605264da4adc8882be
Compiled by jenkins on 2017-10-04T18:50Z
Compiled with protoc 2.5.0
From source with checksum 5e84c185f8a22158e2b0e4b8f85311
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar

很明显,我添加了hive conf来确保正在使用相同的metastore,并且在执行简单操作的情况下,插入覆盖失败了!

解决方法

我遇到了相同的问题,试图将Spark 3.0.1与HDP2.6一起使用。

解决了问题,方法是从hive*.jar文件夹中删除所有jars个文件,然后从HDP发行版中的Spark2复制hive*jar