问题描述
- Hadoop 版本:hadoop-2.10.1
- Hive 版本:hive-2.3.7
- Spark 版本:spark-2.4.7
- Tez 引擎:Tez-0.9.2
我一直在使用 Spark 和 Tez 执行引擎开发 hive。我已经从 dat 文件创建了表格到文本格式,然后用镶木地板从它们创建了新表格。(如下所示)
create table if not exists inventory(.......)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/home/hadoop/tpcds-kit/dats/inventory.dat' INTO TABLE inventory;
create table if not exists inventory_p(.....)
STORED AS PARQUET;
insert overwrite table inventory_p select * from inventory;
我能够使用 MR 和 Tez 引擎对这些表运行查询。但无法使用 Spark 引擎运行查询。下面是错误。此错误仅发生在镶木地板上,而不会发生在使用文本文件创建的表中。
Query ID = hadoop_20201231231436_849a5c3e-6987-4fb5-be62-2a74258d77d4
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = 7ce7367f-80a0-4bd3-8949-2acc1f7d6f96
Running with YARN Application = application_1609424642389_0009
Kill Command = /home/hadoop/hadoop/bin/yarn application -kill application_1609424642389_0009
Query Hive on Spark job[0] stages: [0]
Status: Running (Hive on Spark job[0])
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING Failed
--------------------------------------------------------------------------------------
Stage-0 0 PENDING 4 0 0 4 0
--------------------------------------------------------------------------------------
STAGES: 00/01 [>>--------------------------] 0% ELAPSED TIME: 2.04 s
--------------------------------------------------------------------------------------
Job Failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
20/12/31 23:14:52 [HiveServer2-Background-Pool: Thread-41]: ERROR SessionState: Job Failed with java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage(I[BI)V
java.util.concurrent.ExecutionException: Exception thrown by job
at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337)
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)