由于一个无法摆脱的错误,我无法使用flume将twitter数据拉入hdfs。
命令:
bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent
慰问:
2020-12-14 11:38:08,662 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:154)] Unhandled error
java.lang.NoSuchMethodError: 'boolean twitter4j.conf.Configuration.isStallWarningsEnabled()'
at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
at org.apache.flume.source.twitter.TwitterSource.configure(TwitterSource.java:110)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:325)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
flume-env.sh:我手动将flume-sources-1.0-snapshot.jar添加到flume/lib中。
export JAVA_HOME=/usr/lib/jvm/default-java
export JAVA_OPTS="-Xms500m -Xmx2000m -Dcom.sun.management.jmxremote"
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
FLUME_CLASSPATH="/home/jb/flume/lib/flume-sources-1.0-SNAPSHOT.jar"
twitter.conf网址:
# Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = tutorials point,java, bigdata, mapreduce, mahout, hbase, nosql
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.minBlockReplicas = 1
# Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 100
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
操作系统:ubuntu flume:v1.9.0 hadoop:v3.3.0
1条答案
按热度按时间nvbavucw1#
我设法使它工作。对于那些想知道的人,请阅读这个。
首先,更改Flume版本。我现在使用flume 1.7.0https://flume.apache.org/releases/1.7.0.html. 但也许一个新的版本可以工作,我不想打破它:)
第二,克隆这个回购https://github.com/cloudera/cdh-twitter-example. 里面有一个flume.conf文件。我是这样配置的:
然后,修改pom.xml(版本):
用maven打包
它创建一个target/flume-sources-1.0-snapshot.jar,将其复制到<your\flume\u home>/lib
我更改了之前显示的文件中的类路径:
复制conf/flume.conf,我们刚刚将其写入<your\u flume\u home>/conf
第三,验证lib/twitter4j-core.jar、media-support.jar et stream.jar是否在3.0.3版本中。如果不去下载。
最后:
哈利路亚: