如何在elasticsearch中计算json中用逗号分隔字段中的单词数

axzmvihb  于 2023-01-25  发布在  ElasticSearch
关注(0)|答案(1)|浏览(128)

我有一个这样的json

{"index":{"_index":"companydatabase"}}  
 {"FirstName":"ELVA","LastName":"RECHKEMMER","Designation":"CEO","Salary":"154000","DateOfJoining":"1993-01-11","Address":"8417 Blue Spring St. Port Orange, FL 32127","Gender":"Female","Age":62,"MaritalStatus":"Unmarried","Interests":["Body Building","Illusion","Protesting","Taxidermy","TV watching","Cartooning","Skateboarding"]}
{"index":{"_index":"companydatabase"}}  
 {"FirstName":"JENNEFER","LastName":"WENIG","Designation":"President","Salary":"110000","DateOfJoining":"2013-02-07","Address":"16 Manor Station Court Huntsville, AL 35803","Gender":"Female","Age":45,"MaritalStatus":"Unmarried","Interests":["String Figures","Working on cars","Button Collecting","Surf Fishing"]}
{"index":{"_index":"companydatabase"}}

我想统计一下这些人之间最共同的兴趣
我试着这样说:

request_body = {
  "size": 0,
  "aggs": {
    "interests": {
      "terms": {
        "field": "Interests.keyword",
        "size": 10,
        "order": {
          "count": "desc"
        }
      }
    }
  }
}
JSON(es.search(index="companydatabase", body=request_body))

但它不起作用
谢谢你帮我

guicsvcw

guicsvcw1#

我在你的数据集上试过了,你对这两个例子没有任何共同的兴趣,因此一切都是1,在JSON下面使用,其中一些是共同的兴趣

{
    "FirstName": "JENNEFER",
    "LastName": "WENIG",
    "Designation": "President",
    "Salary": "110000",
    "DateOfJoining": "2013-02-07",
    "Address": "16 Manor Station Court Huntsville, AL 35803",
    "Gender": "Female",
    "Age": 45,
    "MaritalStatus": "Unmarried",
    "Interests": [
        "String Figures",
        "Working on cars",
        "Button Collecting",
        "Surf Fishing",
        "Body Building",
        "Button Collecting",
        "Cartooning"
    ]
}

还有

{
    "FirstName": "ELVA",
    "LastName": "RECHKEMMER",
    "Designation": "CEO",
    "Salary": "154000",
    "DateOfJoining": "1993-01-11",
    "Address": "8417 Blue Spring St. Port Orange, FL 32127",
    "Gender": "Female",
    "Age": 62,
    "MaritalStatus": "Unmarried",
    "Interests": [
        "Body Building",
        "Illusion",
        "Protesting",
        "Taxidermy",
        "TV watching",
        "Cartooning",
        "Skateboarding"
    ]
}

现在,默认情况下,terms aggs根据doc_count对bucket进行排序,因此查询中不需要其他顺序

{
    "size": 0,
    "aggs": {
        "interests": {
            "terms": {
                "field": "Interests.keyword",
                "size": 10
            }
        }
    }
}

下面给你

"buckets": [
                {
                    "key": "Body Building",
                    "doc_count": 2 --> note
                },
                {
                    "key": "Cartooning",
                    "doc_count": 2 -- note
                },
                {
                    "key": "Button Collecting",
                    "doc_count": 1
                },
                {
                    "key": "Illusion",
                    "doc_count": 1
                },
                {
                    "key": "Protesting",
                    "doc_count": 1
                },
                {
                    "key": "Skateboarding",
                    "doc_count": 1
                },
                {
                    "key": "String Figures",
                    "doc_count": 1
                },
                {
                    "key": "Surf Fishing",
                    "doc_count": 1
                },
                {
                    "key": "TV watching",
                    "doc_count": 1
                },
                {
                    "key": "Taxidermy",
                    "doc_count": 1
                }
            ]

相关问题