在elasticsearch dsl中重置索引名称

li9yvcax  于 2023-02-03  发布在  ElasticSearch
关注(0)|答案(3)|浏览(152)

我正在尝试创建一个ETL,它可以从mongo中提取数据,处理数据并加载到elastic中。我将进行日常加载,所以我想到用当前日期命名我的索引。这将有助于我稍后处理第一个索引。我使用了elasticsearch dsl指南:https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html我遇到的问题来自于我使用类的一点经验。我不知道如何从类中重置索引名称。下面是我的类代码(custom_indexs.py):

from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime

class News(Document):
    title = Text(analyzer='standard', fields={'raw': Keyword()})
    manual_tagging = Keyword()

    class Index:
        name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")

    def save(self, ** kwargs):
        return super(News, self).save(** kwargs)

    def is_published(self):
        return datetime.now() >= self.processed

这是我创建该类示例的代码部分:

from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime

connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
    new_insert = News(meta={'id': doc.url_hashed}, 
                      title = doc.title,
                      manual_tagging = doc.customTags,
                   )
    new_insert.save()

每次我调用“News”类时,我都希望有一个新的名称。然而,即使我再次加载类,名称也不会改变(from custom_indexs import News)。我知道这只是我在测试时遇到的一个问题,但我想知道如何强制“重置”。实际上,我最初想在类外部更改名称,就在循环之前使用以下行:

News.Index.name = "NEW_NAME"

然而,这并不起作用。我仍然看到类上定义的名称。有人能帮忙吗?非常感谢!PS:这一定只是一个面向对象编程的问题。为我对这个问题的无知道歉。

pdsfdshx

pdsfdshx1#

也许你可以利用Document.init()接受一个index关键字参数的事实,如果你想自动设置索引名,你可以在News类中实现init(),并在你的实现中调用super().init(...)
一个简单的例子(python 3.x):

from elasticsearch_dsl import Document
from elasticsearch_dsl.connections import connections
import datetime

class News(Document):
    @classmethod
    def init(cls, index=None, using=None):
        index_name = index or 'processed_news_' + datetime.datetime.now().strftime("%Y%m%d")
        return super().init(index=index_name, using=using)
vybvopom

vybvopom2#

可以在调用**保存()**时覆盖索引。

new_insert.save('processed_news_' + datetime.datetime.now().strftime("%Y%m%d"))
vdgimpew

vdgimpew3#

示例如下。

# coding: utf-8

import datetime

from elasticsearch_dsl import Keyword, Text, \
    Index, Document, Date
from elasticsearch_dsl.connections import connections

HOST = "localhost:9200"

index_names = [
    "foo-log-",
    "bar-log-",
]

default_settings = {"number_of_shards": 4, "number_of_replicas": 1}

index_settings = {
    "foo-log-": {
        "number_of_shards": 40,
        "number_of_replicas": 1
    }
}

class LogDoc(Document):
    level = Keyword(ignore_above=256)

    date = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")

    hostname = Text(fields={'fields': Keyword(ignore_above=256)})
 
    message = Text()

    createTime = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")

def auto_create_index():
    '''自动创建ES索引'''
    connections.create_connection(hosts=[HOST])

    for day in range(3):
        dt = datetime.datetime.now() + datetime.timedelta(days=day)
        for index in index_names:
            name = index + dt.strftime("%Y-%m-%d")
            settings = index_settings.get(index, default_settings)

            idx = Index(name=name)
            idx.document(LogDoc)
            idx.settings(**settings)
            try:
                idx.create()
            except Exception as e:
                print(e)
                continue
            print("create index %s" % name)

if __name__ == '__main__':
    auto_create_index()

相关问题