elasticsearch批量索引用于增量处理新数据

xmd2e60i 于 2021-06-14 发布在 ElasticSearch

关注(0)|答案(1)|浏览(323)

我已经实现了批量索引。我想提高效率。


# current implementation in Python

def products_to_index():
    for product in all_products():
        yield {
            "_op_type": "index",
            "_index": INDEX_NAME,
            "_id": product.id,
            "_source": {"name": product.name, "content": product.content},
        }

def main(args):
    # Connect to localhost:9200 by default.
    es = Elasticsearch()
    body = ANALYZER  

    es.indices.create(index=INDEX_NAME, body=body)

    bulk(es, products_to_index())

这个实现似乎只是获取所有的数据，并对它们逐批进行索引。我想实施一个额外的步骤来检查这个条目是否已经被索引。
我还考虑过从本地保存的索引路径加载。不知道如何进行。
我看了api文档，但找不到。

elasticsearch python

来源：https://stackoverflow.com/questions/64446721/elasticsearch-bulk-indexing-for-handling-new-data-incrementally

1条答案

按热度按时间

xuo3flqw1#

通过使用 index 你告诉elasticsearch我想索引这个文档，如果它存在，就更新它。但如果你用 create 键入特定的id，您可以使用elasticsearch以“如果不存在则放置”的方式进行搜索。当您使用bulkapi时，您的响应将分别显示每个文档的结果，并且您可以知道插入了哪个文档以及没有插入哪个文档。为此，只需设置 op_type 作为 create .

赞(0）回复(0）举报 2021-06-14

我来回答

elasticsearch批量索引用于增量处理新数据

1条答案

相关问题

热门标签

最新问答