我想为我的博客做一个搜索API,我在ElasticSearch中以HTML格式存储所有数据,以便尽可能快地在全文搜索中使用它,但HTML标签困扰着我在我的内容中进行搜索。通过多次搜索,我发现了一个关于如何在搜索中忽略它们的答案,但我无法将它们过滤掉而不显示在结果中。有什么方法可以做到这一点吗?
现在我搜索并得到以下结果:
POST /test/_search HTTP/1.1
Content-Type: application/json
Content-Length: 68
{
"query": {
"match": {
"html": "more"
}
}
}
答复:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"html": "<html><body><h1 style="font-family: Arial">Test</h1> <span>More test</span></body></html>"
}
}
]
}
}
但我想要这样的东西:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"html": "Test More test"
}
}
]
}
}
1条答案
按热度按时间zy1mlcev1#
您需要在您的Map中使用HTML条形字符过滤器。通过它,您将从您的文档中删除HTML元素。我使用此post试图接近您的结果。
结果: