我有一个es示例运行travel.stackexchange的数据。
# Example Data
first = ["This was one of our definition questions, but also one that interests me personally:
How can I find a guide that will take me safely through the Amazon jungle? I'd love
to explore the Amazon but would not attempt it without a guide, at least not the first
time. I'd prefer a guide that wasn't going to ambush me or anything.I don't want to go
anywhere touristy. Start and end points are open, but the trip should take me places
where I am not likely to see other travelers/tourists and where I will definitely
require a good guide in order to be safe.", # content
'2011-06-21T20:22:33.760', # date of creation
'39', # votes
'2799', # views
'8', # answers
'4', # comments
'How can I find a guide that will take me safely through the Amazon jungle?', # title
'"guides", "extreme-tourism", "amazon-river", "amazon-jungle"'] # TAGS
我使用
connections.create_connection(alias='es', hosts=['localhost'], timeout=60)
正如您所看到的,这个帖子有几个标签(“guides”、“amazon river”…)。当我将数据输入es时,我将标记格式化为字符串。
现在,当我查询索引时(当然是使用更大的数据集)
s = Search(using="es", index=current_index)
并计算每个标签被提及的次数。
s.aggs.bucket("per_tag", "terms", field="tags", size=5)
r = s.execute()
然而,当我查看结果时,它们看起来像
r.aggregations.per_tag.buckets
>>> [{'key': 'no tags', 'doc_count': 70672},
>>> {'key': '"visas", "uk"', 'doc_count': 330},
>>> {'key': '"visas", "schengen"', 'doc_count': 264},
>>> {'key': '"visas"', 'doc_count': 253},
>>> {'key': '"air-travel"', 'doc_count': 182}]
这很好,但不是我想要的。如你所见,“签证”这个标签被提到了三次,而不是一次。我想要的是
>>> [{'key': 'no tags', 'doc_count': 70672},
>>> {'key': 'visas', 'doc_count': XXX},
>>> {'key': 'uk', 'doc_count': YYY},
>>> {'key': 'Schenge', 'doc_count': ZZZ},
>>> {'key': 'air-travel', 'doc_count': AAA}]
到目前为止,我尝试的是以不同的方式输入标签。一次 ""
一次没有,离开 ,
,仅与 spaces
. 但是,我觉得我必须更简洁地定义聚合函数,而不是输入。任何帮助都将不胜感激。
暂无答案!
目前还没有任何答案,快来回答吧!