query={
"_source": ["_id", "_score", "pub_info.year", "abstract", "title", "pub_info.title", "authors.fullname"],
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "search_text"
}
}
},
{
"match": {
"abstract": {
"query": "search_text"
}
}
}
],
"filter": [
{
"exists": {
"field": "pub_info.year"
}
},
{
"exists": {
"field": "pub_info.title"
}
},
{
"exists": {
"field": "abstract"
}
},
{
"exists": {
"field": "title"
}
},
{
"exists": {
"field": "authors.fullname"
}
}
]
}
}
}
字符串
`
我目前正在使用一个大型Elasticsearch索引,其中包含来自互联网的研究论文的大约20列数据。我的搜索查询主要需要其中5列的数据。考虑到某些列中的数据量很大(例如完整的研究论文文本),我的查询执行时间大约为4秒,我想减少这一时间。
为了优化搜索速度,我正在考虑创建一个新的,更小的索引,只包含我需要的5列。然而,我担心保持两个索引之间的同步,因为我不断地向原始索引添加新数据。
创建一个只包含必要列的新的精简索引是否会提高搜索性能,特别是考虑到当前索引中的大型字段?如何有效地管理两个索引之间的同步?
如果没有,还有其他方法可以快速提取吗?
我想使用re-index并创建一个新的索引
1个搜索结果的配置文件输出
{
'took': 483,
'timed_out': False,
'_shards': {
'total': 1,
'successful': 1,
'skipped': 0,
'failed': 0
},
'hits': {
'total': {
'value': 10000,
'relation': 'gte'
},
'max_score': 87.853134,
'hits': [{
'_index': 'csx_citeseer_docs_old_pubinfo',
'_id': 'vYHYzYMByhvPsGt1HJaH',
'_score': 87.853134,
'_ignored': ['abstract.keyword', 'text.keyword'],
'_source': {
'pub_info': {
'year': 0,
'title': 'in IEEE International Conference on Automatic Face and Gesture Recognition (FGR), 2004'
},
'abstract': 'The major challenges that sign language recognition (SLR) now faces are developing methods that solve large vocabulary continuous sign problems. In this paper, large vocabulary continuous SLR based on transition movement models is proposed. The proposed method employs the temporal clustering algorithm to cluster a large amount of transition movements, and then the corresponding training algorithm is also presented for automatically segmenting and training these transition movement models. The clustered models can improve the generalization of transition movement models, and are very suitable for large vocabulary continuous SLR. At last, the estimated transition movement models, together with sign models, are viewed as candidate models of the Viterbi search algorithm for recognizing continuous sign language. Experiments show that continuous SLR based on transition movement models has good performance over a large vocabulary of 5113 signs.',
'title': 'Transition movement models for large vocabulary continuous sign language recognition',
'authors': [{
'fullname': 'Wen Gao'
}, {
'fullname': 'Gaolin Fang'
}, {
'fullname': 'Debin Zhao'
}, {
'fullname': 'Yiqiang Chen'
}]
}
}]
},
'profile': {
'shards': [{
'id': '[YP4_kPcEQ0CbSrGk34Cz3g][csx_citeseer_docs_old_pubinfo][0]',
'node_id': 'YP4_kPcEQ0CbSrGk34Cz3g',
'shard_id': 0,
'index': 'csx_citeseer_docs_old_pubinfo',
'cluster': '(local)',
'searches': [{
'query': [{
'type': 'BooleanQuery',
'description': '(title:large title:language title:models)^3.0 (abstract:large abstract:language abstract:models)^3.0 #FieldExistsQuery [field=pub_info.year] #FieldExistsQuery [field=pub_info.title] #FieldExistsQuery [field=abstract] #FieldExistsQuery [field=title] #FieldExistsQuery [field=authors.fullname]',
'time_in_nanos': 446540061,
'breakdown': {
'set_min_competitive_score_count': 143,
'match_count': 36826,
'shallow_advance_count': 0,
'set_min_competitive_score': 266826,
'next_doc': 332618765,
'match': 5455542,
'next_doc_count': 47019,
'score_count': 36690,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 20675753,
'advance_count': 139,
'count_weight_count': 0,
'score': 63955841,
'build_scorer_count': 278,
'create_weight': 8025955,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 15541379
},
'children': [{
'type': 'BoostQuery',
'description': '(title:large title:language title:models)^3.0',
'time_in_nanos': 65287560,
'breakdown': {
'set_min_competitive_score_count': 143,
'match_count': 0,
'shallow_advance_count': 1051,
'set_min_competitive_score': 28225,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 27023,
'compute_max_score_count': 776,
'compute_max_score': 558010,
'advance': 45107685,
'advance_count': 93389,
'count_weight_count': 0,
'score': 10450798,
'build_scorer_count': 278,
'create_weight': 3788158,
'shallow_advance': 713540,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 4641144
},
'children': [{
'type': 'TermQuery',
'description': 'title:large',
'time_in_nanos': 8093891,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 2792,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 6092,
'compute_max_score_count': 2535,
'compute_max_score': 377147,
'advance': 3696584,
'advance_count': 51759,
'count_weight_count': 0,
'score': 1259870,
'build_scorer_count': 417,
'create_weight': 1223314,
'shallow_advance': 307622,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 1229354
}
}, {
'type': 'TermQuery',
'description': 'title:language',
'time_in_nanos': 7118601,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 2801,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 6024,
'compute_max_score_count': 2525,
'compute_max_score': 365923,
'advance': 3374881,
'advance_count': 45762,
'count_weight_count': 0,
'score': 1249506,
'build_scorer_count': 417,
'create_weight': 1143705,
'shallow_advance': 384169,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 600417
}
}, {
'type': 'TermQuery',
'description': 'title:models',
'time_in_nanos': 11752398,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 2739,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 16205,
'compute_max_score_count': 2544,
'compute_max_score': 389149,
'advance': 6010813,
'advance_count': 84552,
'count_weight_count': 0,
'score': 3112645,
'build_scorer_count': 417,
'create_weight': 1362275,
'shallow_advance': 316321,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 561195
}
}]
}, {
'type': 'BoostQuery',
'description': '(abstract:large abstract:language abstract:models)^3.0',
'time_in_nanos': 52533128,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 916,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 19589,
'compute_max_score_count': 778,
'compute_max_score': 547882,
'advance': 34529242,
'advance_count': 31268,
'count_weight_count': 0,
'score': 9276956,
'build_scorer_count': 278,
'create_weight': 4115714,
'shallow_advance': 643247,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 3420087
},
'children': [{
'type': 'TermQuery',
'description': 'abstract:large',
'time_in_nanos': 12468049,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 7320,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 6584,
'compute_max_score_count': 7261,
'compute_max_score': 1513554,
'advance': 6084517,
'advance_count': 27872,
'count_weight_count': 0,
'score': 2001292,
'build_scorer_count': 417,
'create_weight': 1362105,
'shallow_advance': 521294,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 985287
}
}, {
'type': 'TermQuery',
'description': 'abstract:language',
'time_in_nanos': 8997831,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 7161,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 5572,
'compute_max_score_count': 7099,
'compute_max_score': 1161564,
'advance': 3694625,
'advance_count': 23522,
'count_weight_count': 0,
'score': 1696084,
'build_scorer_count': 417,
'create_weight': 1301460,
'shallow_advance': 552771,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 591327
}
}, {
'type': 'TermQuery',
'description': 'abstract:models',
'time_in_nanos': 13304097,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 7329,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 11419,
'compute_max_score_count': 7267,
'compute_max_score': 1571109,
'advance': 5726658,
'advance_count': 27511,
'count_weight_count': 0,
'score': 3418287,
'build_scorer_count': 417,
'create_weight': 1414618,
'shallow_advance': 564774,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 608651
}
}]
}, {
'type': 'FieldExistsQuery',
'description': 'FieldExistsQuery [field=pub_info.year]',
'time_in_nanos': 7886385,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 0,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 0,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 7256218,
'advance_count': 136933,
'count_weight_count': 0,
'score': 0,
'build_scorer_count': 417,
'create_weight': 468,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 629699
}
}, {
'type': 'FieldExistsQuery',
'description': 'FieldExistsQuery [field=pub_info.title]',
'time_in_nanos': 161076159,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 0,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 0,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 160740113,
'advance_count': 137296,
'count_weight_count': 0,
'score': 0,
'build_scorer_count': 417,
'create_weight': 153,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 335893
}
}, {
'type': 'FieldExistsQuery',
'description': 'FieldExistsQuery [field=abstract]',
'time_in_nanos': 10399817,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 0,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 0,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 10186290,
'advance_count': 137214,
'count_weight_count': 0,
'score': 0,
'build_scorer_count': 417,
'create_weight': 86,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 213441
}
}, {
'type': 'FieldExistsQuery',
'description': 'FieldExistsQuery [field=title]',
'time_in_nanos': 7992809,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 0,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 0,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 7809787,
'advance_count': 136935,
'count_weight_count': 0,
'score': 0,
'build_scorer_count': 417,
'create_weight': 88,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 182934
}
}, {
'type': 'FieldExistsQuery',
'description': 'FieldExistsQuery [field=authors.fullname]',
'time_in_nanos': 64453874,
'breakdown': {
'set_min_competitive_score_count': 0,
'match_count': 0,
'shallow_advance_count': 0,
'set_min_competitive_score': 0,
'next_doc': 0,
'match': 0,
'next_doc_count': 0,
'score_count': 0,
'compute_max_score_count': 0,
'compute_max_score': 0,
'advance': 64194107,
'advance_count': 136933,
'count_weight_count': 0,
'score': 0,
'build_scorer_count': 417,
'create_weight': 86,
'shallow_advance': 0,
'count_weight': 0,
'create_weight_count': 1,
'build_scorer': 259681
}
}]
}],
'rewrite_time': 1229212,
'collector': [{
'name': 'QueryPhaseCollector',
'reason': 'search_query_phase',
'time_in_nanos': 72029292,
'children': [{
'name': 'SimpleTopScoreDocCollector',
'reason': 'search_top_hits',
'time_in_nanos': 68111667
}]
}]
}],
'aggregations': [],
'fetch': {
'type': 'fetch',
'description': '',
'time_in_nanos': 918368,
'breakdown': {
'load_stored_fields': 382084,
'load_source': 2604,
'load_stored_fields_count': 1,
'next_reader_count': 1,
'load_source_count': 1,
'next_reader': 20097
},
'debug': {
'stored_fields': ['_id', '_routing', '_source']
},
'children': [{
'type': 'FetchSourcePhase',
'description': '',
'time_in_nanos': 404442,
'breakdown': {
'process_count': 1,
'process': 403977,
'next_reader': 465,
'next_reader_count': 1
},
'debug': {
'fast_path': 0
}
}, {
'type': 'StoredFieldsPhase',
'description': '',
'time_in_nanos': 10685,
'breakdown': {
'process_count': 1,
'process': 9947,
'next_reader': 738,
'next_reader_count': 1
}
}]
}
}]
}
}
型
我的Map
{
"csx_citeseer_docs_old_pubinfo" : {
"aliases" : { },
"mappings" : {
"properties" : {
"abstract" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"authors" : {
"properties" : {
"affiliation" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"forename" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"fullname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"surname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"has_pdf" : {
"type" : "boolean"
},
"is_citation" : {
"type" : "boolean"
},
"is_public" : {
"type" : "boolean"
},
"paper_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"pub_info" : {
"properties" : {
"date" : {
"type" : "long"
},
"publisher" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"year" : {
"type" : "long"
}
}
},
"source_url" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title_suggest" : {
"properties" : {
"input" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
},
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "csx_citeseer_docs_old_pubinfo",
"creation_date" : "1665206124049",
"number_of_replicas" : "1",
"uuid" : "IjQStC3lS_SINa5WzjeVjQ",
"version" : {
"created" : "8040399"
}
}
}
}
}
型
1条答案
按热度按时间xmakbtuz1#
所以,这里是我会尝试做的,如果我面临这个问题.首先,我会优化您的Map删除所有不必要的关键字字段.似乎你有他们对所有的文本字段,即使有没有机会,你会使用它们.
字符串
我还将添加一些预处理,并添加一个布尔字段
is_complete
,如果您在exists
查询中检查的所有必要信息都存在,则需要使用true
填充,否则使用false
。您可以使用摄取处理器或在应用程序中执行此操作。在执行了前面的步骤之后,我会测试性能,如果性能仍然不令人满意,并且你有一个强大的机器,有很多内核和快速磁盘,我会尝试并行运行多个查询,这将增加吞吐量,或者尝试增加索引中的分片数量,这将使每个查询在多个分片上并行运行,增加吞吐量,因为它必须做更多的工作,但是可能减少等待时间,因为它将能够并行地做更多的事情。