我正在使用flume将twitter数据加载到hdfs位置。flume ng命令运行成功,显示如下消息:
[![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 docs
18/06/24 22:52:37 INFO twitter.TwitterSource: Processed 17,600 docs
18/06/24 22:52:39 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp
18/06/24 22:52:39 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp to hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675
18/06/24 22:52:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/06/24 22:52:40 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/06/24 22:52:40 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/tmp/pk/FlumeData.1529905960074.tmp
18/06/24 22:52:40 INFO twitter.TwitterSource: Processed 17,700 docs
18/06/24 22:52:44 INFO twitter.TwitterSource: Processed 17,800 docs
18/06/24 22:52:47 INFO twitter.TwitterSource: Processed 17,900 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Processed 18,000 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Total docs indexed: 18,000, total skipped docs: 0
18/06/24 22:52:51 INFO twitter.TwitterSource: 29 docs/second
18/06/24 22:52:51 INFO twitter.TwitterSource: Run took 618 seconds and processed:
18/06/24 22:52:51 INFO twitter.TwitterSource: 0.008 MB/sec sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource: 4.859 MB text sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource: There were 0 exceptions ignored:
18/06/24 22:52:54 INFO twitter.TwitterSource: Processed 18,100 docs
18/06/24 22:52:57 INFO twitter.TwitterSource: Processed 18,200 docs
18/06/24 22:53:00 INFO twitter.TwitterSource: Processed 18,300 docs
18/06/24 22:53:04 INFO twitter.TwitterSource: Processed 18,400 docs
18/06/24 22:53:07 INFO twitter.TwitterSource: Processed 18,500 docs
18/06/24 22:53:10 INFO twitter.TwitterSource: Processed 18,600 docs
18/06/24 22:53:14 INFO twitter.TwitterSource: Processed 18,700 docs
18/06/24 22:53:17 INFO twitter.TwitterSource: Processed 18,800 docs
18/06/24 22:53:21 INFO twitter.TwitterSource: Processed 18,900 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Processed 19,000 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Total docs indexed: 19,000, total skipped docs: 0
18/06/24 22:53:24 INFO twitter.TwitterSource: 29 docs/second][1]][1]
但输出中没有生成文件 hdfs
文件夹。也没有引发异常。
有人能帮我吗。
下面是 conf
文件:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Use CLoudera Twitter Source;
# place your consumerKey and accessToken details here
# Describing/Configuring the source
# TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey=xxx
TwitterAgent.sources.Twitter.consumerSecret=xxx
TwitterAgent.sources.Twitter.accessToken=xxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxx
TwitterAgent.sources.Twitter.maxBatchSize = 1000
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 1000
TwitterAgent.sources.Twitter.keywords=harry kane
# Use a channel which buffers events in memory
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=100
TwitterAgent.channels.MemChannel.transactionCapacity=100
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/tmp/pk
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
# Bind the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
暂无答案!
目前还没有任何答案,快来回答吧!