通过api将json转换为elasticsearch

pn9klfpd  于 2021-06-15  发布在  ElasticSearch
关注(0)|答案(2)|浏览(352)

我正在尝试添加一个json文件到elasticsearch,它有大约30.000行,并且格式不正确。我试图上传它通过批量api,但我找不到一种方法,以正确的格式,它实际上工作。我使用的是ubuntu16.04lts。
这是json的格式:

{
    "rt": "2018-11-20T12:57:32.292Z",
    "source_info": { "ip": "0.0.60.50" },
    "end": "2018-11-20T12:57:32.284Z",
    "severity": "low",
    "duid": "5b8d0a48ba59941314e8a97f",
    "dhost": "004678",
    "endpoint_type": "computer",
    "endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
    "suser": "Katerina",
    "group": "PERIPHERALS",
    "customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
    "type": "Event::Endpoint::Device::AlertedOnly",
    "id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
    "name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

我知道批量api的格式需要 {"index":{"_id":*}} 在文件中的每个json对象之前,如下所示: {"index":{"_id":1}} ```
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

如果手动插入索引id,然后使用以下表达式curl-s-h“content type: `application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary @results.json` 它将上传它没有错误。
我的问题是,如何添加索引id `{"index":{"_id":*}}` 到json的每一行,让它准备好上传吗?显然,索引id必须在每一行中加上+1,有没有办法从cli中这样做?
抱歉,如果这篇文章看起来不像它应该,我读了数以百万计的文章在堆栈溢出,但这是我的第一个#绝望的
事先非常感谢!
kt06eoxx

kt06eoxx1#

谢谢你的回答,他们确实帮助我朝着正确的方向前进。
我已经制作了一个bash脚本来自动将日志下载、格式化和上传到elasticsearch:


# !/bin/bash

echo "Downloading logs from Sophos Central. Please wait."

cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

# This deletes the last batch of results

rm result.json
cd .. 

# This triggers the script to download a new batch of logs from Sophos

./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

# Adds newline at the beginning of the logs file

sed -i '1 i\{"index":{}}' result.json

# Adds indexes

sed -i '3~2s/^/{"index":{}}/' result.json

# Adds json file to elasticsearch

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary @result.json

我就是这样做到的。可能有更简单的选择,但这一个做了我的把戏。希望对别人有用!
再次感谢大家!:d

x8diyxa7

x8diyxa72#

您的问题是elasticsearch希望文档在一行上是有效的json,如下所示:

{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}

您必须找到一种方法来转换您的输入文件,以便每行有一个文档,然后您就可以使用val的解决方案了。

相关问题