如何通过flume通过代理将twitterdata提供给hdfs?

aij0ehis  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(336)

我已经安装了flume,并试图将twitter数据输入hdfs文件夹。
我的flume.conf文件如下所示:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <required>
TwitterAgent.sources.Twitter.consumerSecret = <required>
TwitterAgent.sources.Twitter.accessToken = <required>
TwitterAgent.sources.Twitter.accessTokenSecret = <required>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, china, india.
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

我遇到了以下错误:

2014-11-03 02:00:49,834 (Twitter Stream consumer-1[Establishing connection]) [DEBUG -  twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] User-Agent: twitter4j http://twitter4j.org/ /2.2.6
2014-11-03 02:00:49,834 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Connection: close
2014-11-03 02:00:49,835 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-Version: 2.2.6
2014-11-03 02:00:49,835 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client-URL: http://twitter4j.org/en/twitter4j-2.2.6.xml
2014-11-03 02:00:49,836 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Accept-Encoding: gzip
2014-11-03 02:00:49,836 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] X-Twitter-Client: Twitter4J
2014-11-03 02:00:49,837 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:75)] Post Params: count=0&track=hadoop%2Cbig%20data%2Canalytics%2Cbigdata%2Ccloudera%2Cdata%20science&include_entities=true
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Connection refused
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Waiting for 2000 milliseconds
2014-11-03 02:00:49,843 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Waiting for 2000 milliseconds]
2014-11-03 02:00:51,843 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Connection refused
2014-11-03 02:00:51,844 (Twitter Stream consumer-1[Waiting for 2000 milliseconds]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.

我的大学网络配备了代理服务器。我认为问题是由于代理服务器。
如何将代理与flume一起使用?

0ve6wy6x

0ve6wy6x1#

从中构建jarhttps://github.com/cloudera/cdh-twitter-example
解压缩,然后在内部执行(如上所述):
转到/cdh-twitter-example-master/flume-sources/src/main/java/com/cloudera/flume/source/twittersource.java
再加上这几行

cb.setHttpProxyHost("your proxy");
cb.setHttpProxyPort(8080);//port
cb.setHttpProxyUser("");
cb.setHttpProxyPassword("");

$cdFlume源
$mvn套餐
den把jar从target放到flume lib文件夹中

相关问题