我有一个样本数据:
employee_name,user_id,O,C,E,A,N
Yvette Vivien Donovan,YVD0093,38,19,29,15,36
Troy Alvin Craig,TAC0118,34,40,24,15,34
Eden Jocelyn Mcclain,EJM0952,20,37,48,35,34
Alexa Emma Wood,AEW0655,25,20,18,40,38
Celeste Maris Griffith,CMG0936,36,13,18,50,29
Tanek Orson Griffin,TOG0025,40,36,24,19,26
Colton James Lowery,CJL0436,39,41,27,25,28
Baxter Flynn Mcknight,BFM0761,42,32,28,17,22
Olivia Calista Hodges,OCH0195,37,36,39,38,32
Price Zachery Maldonado,PZM0602,24,46,30,18,29
Daryl Delilah Atkinson,DDA0185,17,43,33,18,25
并将配置文件存储为:
input {
file {
path => "/path/psychometric_data.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
amazon_es {
hosts => [ "https://xxx-xxx-es-xxx.xx-xx-1.es.amazonaws.com:443" ]
ssl => true
region => "ap-south-1"
index => "psychometric_data"
}
}
我期望第一行(即employee\u name,user\u id,o,c,e,a,n)作为elasticsearch字段名(header),但我期望第三行(即troy alvin craig,tac0118,34,40,24,15,34)作为header,如下所示。
{
"_index": "psychometric_data",
"_type": "_doc",
"_id": "md4hm3YB8",
"_score": 1,
"_source": {
"15": "21",
"24": "17",
"34": "39",
"40": "37",
"@version": "1",
"@timestamp": "2020-12-25T18:20:00.759Z",
"message": "Ishmael Mannix Velazquez,IMV0086,22,37,17,21,39\r",
"path": "/path/psychometric_data.csv",
"Troy Alvin Craig": "Ishmael Mannix Velazquez",
"host": "xx-ThinkPad-xx",
"TAC0118": "IMV0086"
}
}
原因可能是什么?
1条答案
按热度按时间cczfrluj1#
如果你设置
autodetect_column_names
如果为true,则筛选器将第一行解释为列名。如果pipeline.workers被设置为多个,那么看哪个线程先设置列名就是一场竞赛。由于不同的工人处理不同的行,这意味着它可能不使用第一行。必须将pipeline.workers设置为1。除此之外,java执行引擎(默认启用)并不总是保持事件的顺序。logstash.yml中有一个设置pipeline.ordered来控制它。在7.9中,如果pipeline.workers设置为1,则保持事件顺序。
您没有说明正在运行哪个版本。对于从7.0(当java_执行成为默认值)到7.6的任何版本,修复方法是使用
pipeline.java_execution: false
在logstash.yml或--java_execution false
在命令行上。对于7.7以后的任何7.x版本,请确保pipeline.ordered设置为auto或true(在7.x中,auto是默认值)。在未来的版本(也许是8.x)中,pipeline.ordered将默认为false。