flumeng中的hadoop正则表达式验证

olmpazwi  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(333)

我正在尝试使用flume ng(1.2)将数据从平面文件(日志文件)加载到hbase中。平面文件有多个列,每个列以冒号(:)分隔,它们都需要加载到hbase中的单独列中。我在查看论坛时发现有一个来自apache的jar可以解决这个问题(org.apache.flume.sink.hbase.regexhbaseeventserializer),但我找不到任何混淆文件或在internet上的用法。如果有人能帮我配置文件,那会很有帮助
平面文件中的内容1:nn 2:pp 3:mm
谢谢

y1aodyip

y1aodyip1#

RegexHbaseEventSerializer 有三个可以设置的配置参数(如源代码中所述);这些是:

/**Regular expression used to parse groups from event data. */
public static final String REGEX_CONFIG = "regex";

/**Whether to ignore case when performing regex matches. */
public static final String IGNORE_CASE_CONFIG = "regexIgnoreCase";

/**Comma separated list of column names to place matching groups in. */
public static final String COL_NAME_CONFIG = "colNames";

示例配置使用 RegexHbaseEventSerializer 如下所示(部分引用cloudera的flume和hbase演示文稿):

host1.sources = src1
host1.sinks = sink1
host1.channels = ch1

host1.sources.src1.type = seq
host1.sources.src1.port = 25001
host1.sources.src1.bind = localhost
host1.sources.src1.channels = ch1

host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
host1.sinks.sink1.channel = ch1
host1.sinks.sink1.table = test3
host1.sinks.sink1.columnFamily = testing

host1.sinks.sink1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
host1.sinks.sink1.serializer.regex = X
host1.sinks.sink1.serializer.regexIgnoreCase = true
host1.sinks.sink1.serializer.colNames = column_1,column_2,column_3

host1.channels.ch1.type=memory10

相关问题