使用Flume的HiveFlume时,Hive中的记录不完整

3pvhb19x  于 2021-06-04  发布在  Flume
关注(0)|答案(1)|浏览(384)

我想使用flume收集数据到hive数据库。
我已将数据存储在Hive中,但数据不完整。
我想插入如下记录:

1201,Gopal     
 1202,Manisha    
 1203,Masthanvali   
 1204,Kiran    
 1205,Kranthi

当我运行Flume时,hdfs中有bucket\00000和bucket\00000\u flush\u长度( /user/hive/warehouse/test2.db/employee12/delta_0000501_0000600 ). (数据库是test2,表名是employee12)
当我使用“ select * from employee12 ,显示如下:

--------------------------------------------------------------------

hive> select * from employee12;   
OK

(two line next)
1201    Gopal   
1202            
Time taken: 0.802 seconds, Fetched: 1 row(s)

----------------------------------------------------------------------

谁能帮我找到:
为什么只显示两行?
为什么第二排只列出1202?
如何设置正确的配置?

Flume配置:

agenthive.sources = spooldirSource
agenthive.channels = memoryChannel
agenthive.sinks = hiveSink

agenthive.sources.spooldirSource.type=spooldir
agenthive.sources.spooldirSource.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agenthive.sources.spooldirSource.spoolDir=/home/flume/flume_test_home/spooldir

agenthive.sources.spooldirSource.channels=memoryChannel
agenthive.sources.spooldirSource.basenameHeader=true
agenthive.sources.spooldirSource.basenameHeaderKey=basename

agenthive.sinks.hiveSink.type=hive
agenthive.sinks.hiveSink.hive.metastore = thrift://127.0.0.1:9083
agenthive.sinks.hiveSink.hive.database = test2
agenthive.sinks.hiveSink.hive.table = employee12
agenthive.sinks.hiveSink.round = true
agenthive.sinks.hiveSink.roundValue = 10
agenthive.sinks.hiveSink.roundUnit = second
agenthive.sinks.hiveSink.serializer = DELIMITED
agenthive.sinks.hiveSink.serializer.delimiter = ","
agenthive.sinks.hiveSink.serializer.serdeSeparator = ','
agenthive.sinks.hiveSink.serializer.fieldnames =eid,name

agenthive.sinks.hiveSink.channel=memoryChannel    
agenthive.channels.memoryChannel.type=memory
agenthive.channels.memoryChannel.capacity=100

配置单元创建表语句:

create table if not exists employee12 (eid int,name string)
comment 'this is comment' 
clustered by(eid) into 1 buckets 
row format delimited
fields terminated by ',' 
lines terminated by '\n'
stored as orc 
tblproperties('transactional'='true');
kkih6yb8

kkih6yb81#

尝试使用外部表。我在处理类似的设置时发现了这篇文章。

相关问题