用elasticsearch索引openxml结构化文档

332nm8kg  于 2021-06-10  发布在  ElasticSearch
关注(0)|答案(0)|浏览(304)

我们有一套结构化文件。该结构受到openxml数据模型的极大启发。简而言之,文档是由一组有序的段落组成的,每个段落本身都有一个id和一组有序的运行,每个运行都有一个文本内容和一些元数据。
例如,下面的示例文档包含两个[“lorem ipsum”,“dolor sit amet”]段落。

{
    id: 1
    paragraphs : [
        {
            title: "De finibus"
            runs: [
                {text: "Lorem i", metadata: {} }, 
                {text: "psu", metadata: {bold: true} }, 
                {text: "m", metadata: {} }, 
            ] 
        },
        {
            id: 2
            runs: [
                {text: "dolor sit amet", metadata: {} }, 
            ] 
        },
    ]
}

当然,我们希望通过elasticsearch对文档进行索引,以便它能够回答以下查询:
查询: dolor sit 预期答案: in the document with title="De finibus", in the paragraph with id=2, from the 1th character of the 1s run to the 9th character of the 1rd run 查询: ipsum 预期答案: in the document with title="De finibus", in the paragraph with id=1, from the 7th character of the 1s run to the 1st character of the 3rd run 查询: ipsum dolor 预期答案: in the document with title="De finibus", from the 7th character of the 1s run of the paragraph with id=1 to the 5th character of the 1rd run of the paragraph with id=2 我熟悉弹性体中的嵌套域。它可能满足第一个查询。但是,我们应该如何Map文档以将连续的运行和段落连接在一起,并灵活地回答后面的两个查询呢?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题