Apache Flume将数据流式传输到HDFS

问题描述

当前,我使用Apache Flume来获取Twitter数据,并希望将数据放入Hadoop HDFS。以下是我的Twitter抓取文件

# Naming the components on the current agent. 
Twitteragent.sources = Twitter 
Twitteragent.channels = MemChannel
Twitteragent.sinks = HDFS


# Describing/Configuring the source 
Twitteragent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
Twitteragent.sources.Twitter.consumerKey = 
Twitteragent.sources.Twitter.consumerSecret = 
Twitteragent.sources.Twitter.accesstoken = 
Twitteragent.sources.Twitter.accesstokenSecret = 
Twitteragent.sources.Twitter.keywords = hadoop

# Describing/Configuring the sink 
#Twitteragent.sinks.LoggerSink.type = logger  

# Describing/Configuring the sink 
Twitteragent.sinks.HDFS.type = hdfs 
Twitteragent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/hadoop/twitter_data/
Twitteragent.sinks.HDFS.hdfs.fileType = DataStream 
Twitteragent.sinks.HDFS.hdfs.writeFormat = Text 
Twitteragent.sinks.HDFS.hdfs.batchSize = 10000
Twitteragent.sinks.HDFS.hdfs.rollSize = 0 
Twitteragent.sinks.HDFS.hdfs.rollCount = 100000 
 
# Describing/Configuring the channel 
Twitteragent.channels.MemChannel.type = memory 
Twitteragent.channels.MemChannel.capacity = 100000 
Twitteragent.channels.MemChannel.transactionCapacity = 10000
  
# Binding the source and sink to the channel 
Twitteragent.sources.Twitter.channels = MemChannel
Twitteragent.sinks.HDFS.channel = MemChannel 

但是当我运行以下脚本以在flume文件夹中运行Apache flume提取

bin/flume-ng agent --conf conf --conf-file conf/twitter.conf --name Twitteragent -Dflume.root.logger=INFO,console

面对HDFS错误

2020-09-18 18:13:54,900 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process Failed
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:572)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:572)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)

任何人都可以提供建议,谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)