来自twitter的flume hdfs接收器问题

wf82jlnq  于 2021-06-04  发布在  Flume
关注(0)|答案(1)|浏览(614)

我目前在flume中有以下配置:


# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing,

# software distributed under the License is distributed on an

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

# KIND, either express or implied. See the License for the

# specific language governing permissions and limitations

# under the License.

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'TwitterAgent'

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = YPTxqtRamIZ1bnJXYwGW
TwitterAgent.sources.Twitter.consumerSecret = Wjyw9714OBzao7dktH0csuTByk4iLG9Zu4ddtI6s0ho
TwitterAgent.sources.Twitter.accessToken = 2340010790-KhWiNLt63GuZ6QZNYuPMJtaMVjLFpiMP4A2v
TwitterAgent.sources.Twitter.accessTokenSecret = x1pVVuyxfvaTbPoKvXqh2r5xUA6tf9einoByLIL8rar
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

twitter应用程序验证密钥正确。我在flume日志文件中不断发现这个错误:

ERROR   org.apache.flume.SinkRunner     

Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:446)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2310)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2344)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2326)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:227)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:589)
    at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:161)
    at org.apache.flume.sink.hdfs.BucketWriter.access$800(BucketWriter.java:57)
    at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:586)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    ... 1 more
Caused by: java.net.UnknownHostException: hadoop1
    ... 23 more

这里有人知道原因并能向我解释吗?提前谢谢。

ibps3vxo

ibps3vxo1#

根据异常,问题是主机hadoop1未知。
根据flume配置文件,您给出的路径是

hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/

应该可以通过Flume代理从机器上访问。由于机器名不能用于访问不在同一域中的hdfs,因此需要使用中设置的ip地址访问hdfs core-site.xml

相关问题