问题描述
我得到的错误是-
java.lang.RuntimeException: java.lang.NoSuchMethodException:
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS.<init>()
这是简单的代码,过去曾经可以正常工作,但是最近这是我尝试读取存储在GCS存储桶中的CSV时遇到的错误,我从Google Cloud网站下载了正确的jar,但无法成功运行它,请告诉我我做错了什么。
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.master('local[*]') \
.appName('spark-gcs-demo') \
.getorCreate()
bucket = "testBucket"
spark.conf.set('temporaryGcsBucket',"bucket") ####temporary
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r"<pathtoJSON>"
spark._jsc.hadoopConfiguration().set('fs.AbstractFileSystem.gs.impl','com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS')
spark._jsc.hadoopConfiguration().set("fs.gs.impl","com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
# This is required if you are using service account and set true,spark._jsc.hadoopConfiguration().set('fs.gs.auth.service.account.enable','true')
df= spark.read.csv("gs://bucket/iris.csv")
我得到的错误:
Py4JJavaError: An error occurred while calling o38.csv.
: java.lang.RuntimeException: java.lang.NoSuchMethodException: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2668)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:561)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:559)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:559)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:638)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(UnkNown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(UnkNown Source)
at java.lang.reflect.Method.invoke(UnkNown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(UnkNown Source)
Caused by: java.lang.NoSuchMethodException: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS.<init>()
at java.lang.class.getConstructor0(UnkNown Source)
at java.lang.class.getDeclaredConstructor(UnkNown Source)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
... 29 more
解决方法
由于GCS连接器配置错误,您会看到此异常。
您已将fs.gs.impl
Hadoop属性设置为com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
,但应将其设置为com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
,或者甚至可以省略此属性,因为Hadoop can discover FS implementation class使用{{3 }}。