每小时将tweets保存到单个flumedata文件的flume.conf参数应该是多少?

ovfsdjhp  于 2021-06-02  发布在  Hadoop
关注(0)|答案(3)|浏览(364)

我们正在以目录顺序保存tweets,如/user/flume/2016/06/28/13/flumedata。但每小时它会创建100多个flumedata文件 TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb) 同样的事情又发生了。在那之后,我也尝试更改rollcount参数,但没有起作用。如何设置参数以每小时获得一个flumedata文件。

lokaqttq

lokaqttq1#

那怎么办 rollInterval ? 你把它调零了吗。如果是的话,那么问题可能是别的。如果 rollInterval 如果设置为某个值,它将覆盖 rollSize 以及 rollCount 价值观。文件旋转可能在文件大小达到最大值之前发生 rollSize 价值观。另外,检查您设置的hdfs块大小。如果设置为,值太小甚至可能导致文件滚动。
试试这个-

TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100

    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0

    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000

    TwitterAgent.channels.MemChannel.transactionCapacity = 100
cl25kdpy

cl25kdpy2#

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1

TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 1000
ercv8c1e

ercv8c1e3#

我通过将rollinterval=3600 rollcount=0和batchsize=100 flume.conf参数设置为@vkgade suggest来解决这个问题

相关问题