flume文件夹路由

vlju58qv  于 2021-05-30  发布在  Hadoop
关注(0)|答案(0)|浏览(214)

使用curl和flume,我想根据http头的值在本地机器/hdfs上的不同位置发布csv文件。例如,对于这个http头(网络元素:ggsn),我希望我的文件存储在本地计算机上名为ggsn的文件夹中。
我有以下Flume配置
http源
记忆通道
一个hdfs接收器,根据http报头将事件文件路由到不同的位置
然后我使用curl发布csv文件:

find /path/files -type f -exec curl -X POST http://localhost:9043 -H "Content-Type: text/xml" -H "Network-Element: GGSN" --data-binary "@{}" -v \;

生成以下日志:


* About to connect() to localhost port 9043 (#0)
* Trying ::1... Connection refused
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9043 (#0)

> POST / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:9043
> Accept: */*
> Content-Type: text/xml
> Network-Element: GGSN
> Content-Length: 972660
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Server: Jetty(6.1.26)
<

* Connection #0 to host localhost left intact
* Closing connection #0

Flume日志显示如下:

2015-03-16 19:41:14,887 DEBUG org.apache.flume.sink.solr.morphline.BlobHandler: requestHeaders: {Expect=100-continue, Host=localhost:9043, Content-Length=972660, Network-Element=GGSN, User-Agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2, Content-Type=text/xml, Accept=*/*}
2015-03-16 19:41:14,891 DEBUG org.apache.flume.sink.solr.morphline.BlobHandler: blobEvent: [Event headers = {Content-Type=text/xml}, body.length = 972660 ]

我使用这个Flume配置:

sa.sources  = httpsource1
sa.channels = memorychannel1
sa.sinks    = localsink1

sa.sources.httpsource1.type     = http
sa.sources.httpsource1.handler     = org.apache.flume.sink.solr.morphline.BlobHandler
sa.sources.httpsource1.port     = 9043
sa.sources.httpsource1.channels = memorychannel1

sa.channels.memorychannel1.type   = memory
sa.channels.memorychannel1.capacity   = 10000
sa.channels.memorychannel1.transactionCapacity   = 1000

sa.sinks.localsink1.type         = file_roll
sa.sinks.localsink1.channel      = memorychannel1
sa.sinks.localsink1.sink.directory   = /path/%{Network-Element}
sa.sinks.localsink1.sink.rollInterval = 36000

由于某些原因,无法将文件放置在此路径下:/path/%{network element}看起来此路径不存在,即使我已手动创建ggsn文件夹并为其设置了所有权限。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题