我们有一套结构化文件。该结构受到openxml数据模型的极大启发。简而言之,文档是由一组有序的段落组成的,每个段落本身都有一个id和一组有序的运行,每个运行都有一个文本内容和一些元数据。
例如,下面的示例文档包含两个[“lorem ipsum”,“dolor sit amet”]段落。
{
id: 1
paragraphs : [
{
title: "De finibus"
runs: [
{text: "Lorem i", metadata: {} },
{text: "psu", metadata: {bold: true} },
{text: "m", metadata: {} },
]
},
{
id: 2
runs: [
{text: "dolor sit amet", metadata: {} },
]
},
]
}
当然,我们希望通过elasticsearch对文档进行索引,以便它能够回答以下查询:
查询: dolor sit
预期答案: in the document with title="De finibus", in the paragraph with id=2, from the 1th character of the 1s run to the 9th character of the 1rd run
查询: ipsum
预期答案: in the document with title="De finibus", in the paragraph with id=1, from the 7th character of the 1s run to the 1st character of the 3rd run
查询: ipsum dolor
预期答案: in the document with title="De finibus", from the 7th character of the 1s run of the paragraph with id=1 to the 5th character of the 1rd run of the paragraph with id=2
我熟悉弹性体中的嵌套域。它可能满足第一个查询。但是,我们应该如何Map文档以将连续的运行和段落连接在一起,并灵活地回答后面的两个查询呢?
暂无答案!
目前还没有任何答案,快来回答吧!