通过创建更小的索引优化Elasticsearch查询

kb5ga3dv  于 2023-11-17  发布在  ElasticSearch
关注(0)|答案(1)|浏览(163)
query={
  "_source": ["_id", "_score", "pub_info.year", "abstract", "title", "pub_info.title", "authors.fullname"],
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "search_text"
            }
          }
        },
        {
          "match": {
            "abstract": {
              "query": "search_text"
            }
          }
        }
      ],
      "filter": [
        {
          "exists": {
            "field": "pub_info.year"
          }
        },
        {
          "exists": {
            "field": "pub_info.title"
          }
        },
        {
          "exists": {
            "field": "abstract"
          }
        },
        {
          "exists": {
            "field": "title"
          }
        },
        {
          "exists": {
            "field": "authors.fullname"
          }
        }
      ]
    }
  }
}

字符串
`
我目前正在使用一个大型Elasticsearch索引,其中包含来自互联网的研究论文的大约20列数据。我的搜索查询主要需要其中5列的数据。考虑到某些列中的数据量很大(例如完整的研究论文文本),我的查询执行时间大约为4秒,我想减少这一时间。
为了优化搜索速度,我正在考虑创建一个新的,更小的索引,只包含我需要的5列。然而,我担心保持两个索引之间的同步,因为我不断地向原始索引添加新数据。
创建一个只包含必要列的新的精简索引是否会提高搜索性能,特别是考虑到当前索引中的大型字段?如何有效地管理两个索引之间的同步?
如果没有,还有其他方法可以快速提取吗?
我想使用re-index并创建一个新的索引
1个搜索结果的配置文件输出

{
    'took': 483,
    'timed_out': False,
    '_shards': {
        'total': 1,
        'successful': 1,
        'skipped': 0,
        'failed': 0
    },
    'hits': {
        'total': {
            'value': 10000,
            'relation': 'gte'
        },
        'max_score': 87.853134,
        'hits': [{
            '_index': 'csx_citeseer_docs_old_pubinfo',
            '_id': 'vYHYzYMByhvPsGt1HJaH',
            '_score': 87.853134,
            '_ignored': ['abstract.keyword', 'text.keyword'],
            '_source': {
                'pub_info': {
                    'year': 0,
                    'title': 'in IEEE International Conference on Automatic Face and Gesture Recognition (FGR), 2004'
                },
                'abstract': 'The major challenges that sign language recognition (SLR) now faces are developing methods that solve large vocabulary continuous sign problems. In this paper, large vocabulary continuous SLR based on transition movement models is proposed. The proposed method employs the temporal clustering algorithm to cluster a large amount of transition movements, and then the corresponding training algorithm is also presented for automatically segmenting and training these transition movement models. The clustered models can improve the generalization of transition movement models, and are very suitable for large vocabulary continuous SLR. At last, the estimated transition movement models, together with sign models, are viewed as candidate models of the Viterbi search algorithm for recognizing continuous sign language. Experiments show that continuous SLR based on transition movement models has good performance over a large vocabulary of 5113 signs.',
                'title': 'Transition movement models for large vocabulary continuous sign language recognition',
                'authors': [{
                    'fullname': 'Wen Gao'
                }, {
                    'fullname': 'Gaolin Fang'
                }, {
                    'fullname': 'Debin Zhao'
                }, {
                    'fullname': 'Yiqiang Chen'
                }]
            }
        }]
    },
    'profile': {
        'shards': [{
            'id': '[YP4_kPcEQ0CbSrGk34Cz3g][csx_citeseer_docs_old_pubinfo][0]',
            'node_id': 'YP4_kPcEQ0CbSrGk34Cz3g',
            'shard_id': 0,
            'index': 'csx_citeseer_docs_old_pubinfo',
            'cluster': '(local)',
            'searches': [{
                'query': [{
                    'type': 'BooleanQuery',
                    'description': '(title:large title:language title:models)^3.0 (abstract:large abstract:language abstract:models)^3.0 #FieldExistsQuery [field=pub_info.year] #FieldExistsQuery [field=pub_info.title] #FieldExistsQuery [field=abstract] #FieldExistsQuery [field=title] #FieldExistsQuery [field=authors.fullname]',
                    'time_in_nanos': 446540061,
                    'breakdown': {
                        'set_min_competitive_score_count': 143,
                        'match_count': 36826,
                        'shallow_advance_count': 0,
                        'set_min_competitive_score': 266826,
                        'next_doc': 332618765,
                        'match': 5455542,
                        'next_doc_count': 47019,
                        'score_count': 36690,
                        'compute_max_score_count': 0,
                        'compute_max_score': 0,
                        'advance': 20675753,
                        'advance_count': 139,
                        'count_weight_count': 0,
                        'score': 63955841,
                        'build_scorer_count': 278,
                        'create_weight': 8025955,
                        'shallow_advance': 0,
                        'count_weight': 0,
                        'create_weight_count': 1,
                        'build_scorer': 15541379
                    },
                    'children': [{
                        'type': 'BoostQuery',
                        'description': '(title:large title:language title:models)^3.0',
                        'time_in_nanos': 65287560,
                        'breakdown': {
                            'set_min_competitive_score_count': 143,
                            'match_count': 0,
                            'shallow_advance_count': 1051,
                            'set_min_competitive_score': 28225,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 27023,
                            'compute_max_score_count': 776,
                            'compute_max_score': 558010,
                            'advance': 45107685,
                            'advance_count': 93389,
                            'count_weight_count': 0,
                            'score': 10450798,
                            'build_scorer_count': 278,
                            'create_weight': 3788158,
                            'shallow_advance': 713540,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 4641144
                        },
                        'children': [{
                            'type': 'TermQuery',
                            'description': 'title:large',
                            'time_in_nanos': 8093891,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 2792,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 6092,
                                'compute_max_score_count': 2535,
                                'compute_max_score': 377147,
                                'advance': 3696584,
                                'advance_count': 51759,
                                'count_weight_count': 0,
                                'score': 1259870,
                                'build_scorer_count': 417,
                                'create_weight': 1223314,
                                'shallow_advance': 307622,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 1229354
                            }
                        }, {
                            'type': 'TermQuery',
                            'description': 'title:language',
                            'time_in_nanos': 7118601,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 2801,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 6024,
                                'compute_max_score_count': 2525,
                                'compute_max_score': 365923,
                                'advance': 3374881,
                                'advance_count': 45762,
                                'count_weight_count': 0,
                                'score': 1249506,
                                'build_scorer_count': 417,
                                'create_weight': 1143705,
                                'shallow_advance': 384169,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 600417
                            }
                        }, {
                            'type': 'TermQuery',
                            'description': 'title:models',
                            'time_in_nanos': 11752398,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 2739,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 16205,
                                'compute_max_score_count': 2544,
                                'compute_max_score': 389149,
                                'advance': 6010813,
                                'advance_count': 84552,
                                'count_weight_count': 0,
                                'score': 3112645,
                                'build_scorer_count': 417,
                                'create_weight': 1362275,
                                'shallow_advance': 316321,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 561195
                            }
                        }]
                    }, {
                        'type': 'BoostQuery',
                        'description': '(abstract:large abstract:language abstract:models)^3.0',
                        'time_in_nanos': 52533128,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 916,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 19589,
                            'compute_max_score_count': 778,
                            'compute_max_score': 547882,
                            'advance': 34529242,
                            'advance_count': 31268,
                            'count_weight_count': 0,
                            'score': 9276956,
                            'build_scorer_count': 278,
                            'create_weight': 4115714,
                            'shallow_advance': 643247,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 3420087
                        },
                        'children': [{
                            'type': 'TermQuery',
                            'description': 'abstract:large',
                            'time_in_nanos': 12468049,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 7320,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 6584,
                                'compute_max_score_count': 7261,
                                'compute_max_score': 1513554,
                                'advance': 6084517,
                                'advance_count': 27872,
                                'count_weight_count': 0,
                                'score': 2001292,
                                'build_scorer_count': 417,
                                'create_weight': 1362105,
                                'shallow_advance': 521294,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 985287
                            }
                        }, {
                            'type': 'TermQuery',
                            'description': 'abstract:language',
                            'time_in_nanos': 8997831,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 7161,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 5572,
                                'compute_max_score_count': 7099,
                                'compute_max_score': 1161564,
                                'advance': 3694625,
                                'advance_count': 23522,
                                'count_weight_count': 0,
                                'score': 1696084,
                                'build_scorer_count': 417,
                                'create_weight': 1301460,
                                'shallow_advance': 552771,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 591327
                            }
                        }, {
                            'type': 'TermQuery',
                            'description': 'abstract:models',
                            'time_in_nanos': 13304097,
                            'breakdown': {
                                'set_min_competitive_score_count': 0,
                                'match_count': 0,
                                'shallow_advance_count': 7329,
                                'set_min_competitive_score': 0,
                                'next_doc': 0,
                                'match': 0,
                                'next_doc_count': 0,
                                'score_count': 11419,
                                'compute_max_score_count': 7267,
                                'compute_max_score': 1571109,
                                'advance': 5726658,
                                'advance_count': 27511,
                                'count_weight_count': 0,
                                'score': 3418287,
                                'build_scorer_count': 417,
                                'create_weight': 1414618,
                                'shallow_advance': 564774,
                                'count_weight': 0,
                                'create_weight_count': 1,
                                'build_scorer': 608651
                            }
                        }]
                    }, {
                        'type': 'FieldExistsQuery',
                        'description': 'FieldExistsQuery [field=pub_info.year]',
                        'time_in_nanos': 7886385,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 0,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 0,
                            'compute_max_score_count': 0,
                            'compute_max_score': 0,
                            'advance': 7256218,
                            'advance_count': 136933,
                            'count_weight_count': 0,
                            'score': 0,
                            'build_scorer_count': 417,
                            'create_weight': 468,
                            'shallow_advance': 0,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 629699
                        }
                    }, {
                        'type': 'FieldExistsQuery',
                        'description': 'FieldExistsQuery [field=pub_info.title]',
                        'time_in_nanos': 161076159,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 0,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 0,
                            'compute_max_score_count': 0,
                            'compute_max_score': 0,
                            'advance': 160740113,
                            'advance_count': 137296,
                            'count_weight_count': 0,
                            'score': 0,
                            'build_scorer_count': 417,
                            'create_weight': 153,
                            'shallow_advance': 0,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 335893
                        }
                    }, {
                        'type': 'FieldExistsQuery',
                        'description': 'FieldExistsQuery [field=abstract]',
                        'time_in_nanos': 10399817,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 0,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 0,
                            'compute_max_score_count': 0,
                            'compute_max_score': 0,
                            'advance': 10186290,
                            'advance_count': 137214,
                            'count_weight_count': 0,
                            'score': 0,
                            'build_scorer_count': 417,
                            'create_weight': 86,
                            'shallow_advance': 0,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 213441
                        }
                    }, {
                        'type': 'FieldExistsQuery',
                        'description': 'FieldExistsQuery [field=title]',
                        'time_in_nanos': 7992809,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 0,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 0,
                            'compute_max_score_count': 0,
                            'compute_max_score': 0,
                            'advance': 7809787,
                            'advance_count': 136935,
                            'count_weight_count': 0,
                            'score': 0,
                            'build_scorer_count': 417,
                            'create_weight': 88,
                            'shallow_advance': 0,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 182934
                        }
                    }, {
                        'type': 'FieldExistsQuery',
                        'description': 'FieldExistsQuery [field=authors.fullname]',
                        'time_in_nanos': 64453874,
                        'breakdown': {
                            'set_min_competitive_score_count': 0,
                            'match_count': 0,
                            'shallow_advance_count': 0,
                            'set_min_competitive_score': 0,
                            'next_doc': 0,
                            'match': 0,
                            'next_doc_count': 0,
                            'score_count': 0,
                            'compute_max_score_count': 0,
                            'compute_max_score': 0,
                            'advance': 64194107,
                            'advance_count': 136933,
                            'count_weight_count': 0,
                            'score': 0,
                            'build_scorer_count': 417,
                            'create_weight': 86,
                            'shallow_advance': 0,
                            'count_weight': 0,
                            'create_weight_count': 1,
                            'build_scorer': 259681
                        }
                    }]
                }],
                'rewrite_time': 1229212,
                'collector': [{
                    'name': 'QueryPhaseCollector',
                    'reason': 'search_query_phase',
                    'time_in_nanos': 72029292,
                    'children': [{
                        'name': 'SimpleTopScoreDocCollector',
                        'reason': 'search_top_hits',
                        'time_in_nanos': 68111667
                    }]
                }]
            }],
            'aggregations': [],
            'fetch': {
                'type': 'fetch',
                'description': '',
                'time_in_nanos': 918368,
                'breakdown': {
                    'load_stored_fields': 382084,
                    'load_source': 2604,
                    'load_stored_fields_count': 1,
                    'next_reader_count': 1,
                    'load_source_count': 1,
                    'next_reader': 20097
                },
                'debug': {
                    'stored_fields': ['_id', '_routing', '_source']
                },
                'children': [{
                    'type': 'FetchSourcePhase',
                    'description': '',
                    'time_in_nanos': 404442,
                    'breakdown': {
                        'process_count': 1,
                        'process': 403977,
                        'next_reader': 465,
                        'next_reader_count': 1
                    },
                    'debug': {
                        'fast_path': 0
                    }
                }, {
                    'type': 'StoredFieldsPhase',
                    'description': '',
                    'time_in_nanos': 10685,
                    'breakdown': {
                        'process_count': 1,
                        'process': 9947,
                        'next_reader': 738,
                        'next_reader_count': 1
                    }
                }]
            }
        }]
    }
}


我的Map

{
  "csx_citeseer_docs_old_pubinfo" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "abstract" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "authors" : {
          "properties" : {
            "affiliation" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "email" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "forename" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "fullname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "surname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "has_pdf" : {
          "type" : "boolean"
        },
        "is_citation" : {
          "type" : "boolean"
        },
        "is_public" : {
          "type" : "boolean"
        },
        "paper_id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "pub_info" : {
          "properties" : {
            "date" : {
              "type" : "long"
            },
            "publisher" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "year" : {
              "type" : "long"
            }
          }
        },
        "source_url" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "text" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title_suggest" : {
          "properties" : {
            "input" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "csx_citeseer_docs_old_pubinfo",
        "creation_date" : "1665206124049",
        "number_of_replicas" : "1",
        "uuid" : "IjQStC3lS_SINa5WzjeVjQ",
        "version" : {
          "created" : "8040399"
        }
      }
    }
  }
}

xmakbtuz

xmakbtuz1#

所以,这里是我会尝试做的,如果我面临这个问题.首先,我会优化您的Map删除所有不必要的关键字字段.似乎你有他们对所有的文本字段,即使有没有机会,你会使用它们.

"fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

字符串
我还将添加一些预处理,并添加一个布尔字段is_complete,如果您在exists查询中检查的所有必要信息都存在,则需要使用true填充,否则使用false。您可以使用摄取处理器或在应用程序中执行此操作。
在执行了前面的步骤之后,我会测试性能,如果性能仍然不令人满意,并且你有一个强大的机器,有很多内核和快速磁盘,我会尝试并行运行多个查询,这将增加吞吐量,或者尝试增加索引中的分片数量,这将使每个查询在多个分片上并行运行,增加吞吐量,因为它必须做更多的工作,但是可能减少等待时间,因为它将能够并行地做更多的事情。

相关问题