使用 SHC 从 HBase 读取数据时出错

问题描述

我是 Spark 的新手,想从 HBase 表读取/写入数据。我跟着这个 article 并在读取数据时遇到错误

版本: 火花:2.4.7; HBase:1.4.13; 斯卡拉:2.11.12

命令:

spark-shell --jars /usr/lib/hbase/shc/core/target/shc-core-1.1.3-2.4-s_2.11.jar,/usr/lib/hbase/lib/htrace-core4-4.1.0-incubating.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/htrace-core4-4.1.0-incubating.jar

错误java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor

Error Screenshot

我还在 cloudera 上尝试了其他 blogsarticles,但每次都卡在同样的错误上。

我使用的版本或其他版本之间是否存在兼容性问题?

更新 #1

通过升级 hbase-client、hbase-server、hbase-protocol 版本,我能够解决上述错误。还必须在命令中包含 hbase-shaded-miscellaneous 和 hbase-protocol-shaded。

更新命令:

spark-shell --jars /usr/lib/hbase/shc/core/target/shc-core-1.1.3-2.4-s_2.11.jar,/usr/lib/hbase/hbase-client-2.4.0.jar,/usr/lib/hbase/hbase-common-2.4.0.jar,/usr/lib/hbase/hbase-server-2.4.0.jar,/usr/lib/hbase/hbase-protocol-2.4.0.jar,/usr/lib/hbase/hbase-shaded-miscellaneous-2.2.1.jar,/usr/lib/hbase/hbase-protocol-shaded-2.4.0.jar

现在我收到另一个错误

java.io.IOException: java.lang.reflect.UndeclaredThrowableException
  at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:232)
  at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:128)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$getConnection$1.apply(HBaseConnectionCache.scala:144)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$getConnection$1.apply(HBaseConnectionCache.scala:144)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$1.apply(HBaseConnectionCache.scala:135)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$1.apply(HBaseConnectionCache.scala:133)
  at scala.collection.mutable.HashMap.getorElseUpdate(HashMap.scala:79)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$.getConnection(HBaseConnectionCache.scala:133)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$.getConnection(HBaseConnectionCache.scala:144)
  at org.apache.spark.sql.execution.datasources.hbase.RegionResource.init(HBaseResources.scala:96)
  at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:60)
  at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:57)
  at org.apache.spark.sql.execution.datasources.hbase.RegionResource.acquire(HBaseResources.scala:91)
  at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaSEOnException(HBaseResources.scala:77)
  at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaSEOnException(HBaseResources.scala:91)
  at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:111)
  at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:66)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
  at scala.Option.getorElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:384)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3416)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2553)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2553)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
  at org.apache.spark.sql.execution.sqlExecution$.org$apache$spark$sql$execution$sqlExecution$$executeQuery$1(sqlExecution.scala:83)
  at org.apache.spark.sql.execution.sqlExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(sqlExecution.scala:94)
  at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
  at org.apache.spark.sql.execution.sqlExecution$.org$apache$spark$sql$execution$sqlExecution$$withMetrics(sqlExecution.scala:178)
  at org.apache.spark.sql.execution.sqlExecution$$anonfun$withNewExecutionId$1.apply(sqlExecution.scala:93)
  at org.apache.spark.sql.execution.sqlExecution$.withsqlConfPropagated(sqlExecution.scala:200)
  at org.apache.spark.sql.execution.sqlExecution$.withNewExecutionId(sqlExecution.scala:92)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2553)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2767)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:256)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:293)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:754)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:713)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:722)
  ... 55 elided
Caused by: java.lang.reflect.UndeclaredThrowableException: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
  at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1944)
  at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
  at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228)
  ... 116 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:230)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1926)
  ... 118 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
  at java.lang.classLoader.defineClass1(Native Method)
  at java.lang.classLoader.defineClass(ClassLoader.java:756)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.urlclassloader.defineClass(urlclassloader.java:468)
  at java.net.urlclassloader.access$100(urlclassloader.java:74)
  at java.net.urlclassloader$1.run(urlclassloader.java:369)
  at java.net.urlclassloader$1.run(urlclassloader.java:363)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.urlclassloader.findClass(urlclassloader.java:362)
  at java.lang.classLoader.loadClass(ClassLoader.java:418)
  at java.lang.classLoader.loadClass(ClassLoader.java:351)
  at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:286)
  ... 126 more
Caused by: java.lang.classNotFoundException: org.apache.hbase.thirdparty.com.google.protobuf.RpcController
  at java.net.urlclassloader.findClass(urlclassloader.java:382)
  at java.lang.classLoader.loadClass(ClassLoader.java:418)
  at java.lang.classLoader.loadClass(ClassLoader.java:351)
  ... 138 more

我哪里出错了?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)