如何在Elasticsearch中使用正则表达式过滤数据?

fzwojiic  于 2023-06-21  发布在  ElasticSearch
关注(0)|答案(2)|浏览(193)

我是Elasticsearch的新手。最近一直在探索这一点。
我想从一个表的ElasticSearch中获取所有数据。但在同一时间,我只需要thoose数据,其中“phoneNumber”字段不是一个有效的电话号码。我如何才能做到这一点?
数据结构:

{
            "_index": "test_session3",
            "_type": "_doc",
            "_id": "f76adaaf-23e0-455a-9d74-6e2335b60bd9",
            "_score": null,
            "_source": {
                "phoneNumber": "12424242424",
                "utilizationFeeType": "flat",
                "chargingEndTime": "2023-06-07T04:56:37.421Z",
                "invFee": 22.22,
                "sessionStartTime": "2023-06-07T04:56:38.340Z",
                "chargerId": "584eb6b8-4f69-4628-b94a-3ea5c49e14bc",
                "plugStatus": "connected",
                "idleEndTime": "2023-06-07T04:56:37.421Z",
                "sessionEndTime": "2023-06-07T04:56:39.108Z",
                "activeFeeType": "MinimumFee",
                "chargingStartTime": "2023-06-07T04:56:38.340Z",
                "chargingRate": 7.7,
                "createdAt": "2023-06-07T04:54:56.604Z",
                "zone": "Asia/Dhaka",
                "transactionId": 3507,
                "energyConsumed": 0,
                "updatedAt": "2023-06-07T04:56:38.340Z",
                "userId": "53753b5f-1e5f-491f-b8bb-dcd6658f6e5d",
                "sessionStatus": "ended",
                "id": "f76adaaf-23e0-455a-9d74-6e2335b60bd9",
                "chargingDuration": -919,
                "idleDuration": 0,
                "sessionDuration": -919,
                "totals": 12.07,
                "idleFee": 0,
                "idleTime": "",
                "userName": "fex fo",
                "userType": "admin",
                "doubleinvFee": 22.22,
                "doubleTotals": 12.07,
                "locationId": "e3ba8172-0a8c-4c09-9f71-f074e094de9d",
                "propertyId": "a0099bbd-13aa-4c2c-a2ec-68b8d0967947",
                "companyId": "cf58acca-d7df-42b4-9555-7693d4fcc73c",
                "location": {
                    "currentPropertyId": "a0099bbd-13aa-4c2c-a2ec-68b8d0967947",
                    "currentCompanyId": "cf58acca-d7df-42b4-9555-7693d4fcc73c",
                    "title": "Lubowitz Avenue 53370812",
                    "landmark": "Southwest of the front entrance",
                    "zip": "10001",
                    "city": "New York",
                    "state": "New York",
                    "country": "US",
                    "address": "East Avenue, Rochester, NY, USA",
                    "latitude": "34.88923",
                    "longitude": "-118.13612",
                    "online": true,
                    "availableForGuest": true,
                    "status": "Active",
                    "id": "e3ba8172-0a8c-4c09-9f71-f074e094de9d",
                    "currentCompanyName": "Rich Information Technology",
                    "currentPropertyName": "East Avenue",
                    "chargersCount": 2,
                    "longitudeNum": -118.13612,
                    "latitudeNum": 34.88923,
                    "location": {
                        "lat": 34.88923,
                        "lon": -118.13612
                    }
                },
                "company": {
                    "zip": "10001",
                    "status": "Active",
                    "zohoNewComAdded": true,
                    "isDeleted": false,
                    "email": "contact@richinfotech.org",
                    "country": "US",
                    "name": "Rich Information Technology",
                    "state": "New York",
                    "city": "New York",
                    "byCreatedAt": "byCreatedAt",
                    "fileId": "15f98c05-5615-4d95-8cd6-23323668cc7f",
                    "byEmail": "byEmail",
                    "id": "cf58acca-d7df-42b4-9555-7693d4fcc73c",
                    "zohoVendorId": "4064488000000194001",
                    "phone": "1010101011",
                    "byPhone": "byPhone",
                    "website": "richinfotech.org",
                    "ein": "",
                    "createdAt": "2022-10-12T10:54:27.977Z",
                    "zohoCompanyId": "4064488000000318001",
                    "address": "Silicon Valley",
                    "byName": "byName",
                    "zohoCompanyErrorMessage": "",
                    "updatedAt": "2023-06-05T09:27:25.158Z",
                    "zohoCompanyError": false
                }
            },
            "sort": [
                1686113799108
            ]
        }

我想用正则表达式验证数据:/^+?[0-9]{11}$/ ->其中电话号码的长度必须为11,并且可以有前导“+”
以下是我的方法:

const searchFilters = {
            query: {
                bool: {
                    must: [],
                    must_not: [
                        {
                            regexp: {
                                'phoneNumber.keyword': '^\\+?[0-9]{11}$',
                            },
                        },
                    ],
                    filter: [],
                },
            },
            size: 10000,
        };

     
        const resp = await ElasticSearchHelper.search(IndexNames.SESSION, searchFilters);
        const data = resp.body?.hits?.hits;

但它返回空数组数据。我希望有所有的数据,其中电话号码是无效的。

iqih9akk

iqih9akk1#

Tldr;

由于elasticsearch的regex运算符支持,您没有任何结果。在文档中提到,elasticsearch正则表达式不支持^$
Lucene的正则表达式引擎不支持锚操作符,例如^(行首)或$(行尾)。要匹配一个术语,正则表达式必须匹配整个字符串。
您将需要构建另一个匹配Lucene约束的正则表达式(搜索库elasticsearch就是构建在其上的)

nkcskrwz

nkcskrwz2#

你的方法看起来基本正确,但是在正则表达式和Elasticsearch查询中有几个问题。以下是您可以进行的调整,以使用给定的正则表达式验证数据并检索所需的结果:
1.正则表达式:您提供的正则表达式模式^+?[0-9]{11}$有一个小错误。+字符需要用反斜杠(\)进行转义,以按字面意思匹配。正确的正则表达式模式是^\+?[0-9]{11}$

  1. Elasticsearch查询:您可以不使用regexp查询,而是将must_not子句与term查询一起使用,以实现基于电话号码字段的所需筛选。以下是Elasticsearch查询的更新版本:
const searchFilters = {
  query: {
    bool: {
      must_not: [
        {
          term: {
            'phoneNumber.keyword': {
              value: '', // Specify the invalid phone number you want to exclude
            },
          },
        },
      ],
    },
  },
  size: 10000,
};

const resp = await ElasticSearchHelper.search(IndexNames.SESSION, searchFilters);
const data = resp.body?.hits?.hits;

must_not子句中,可以通过设置term查询的value字段来提供要排除的特定无效电话号码。这将获取电话号码字段不等于指定的无效电话号码的所有数据。
确保将IndexNames.SESSION替换为Elasticsearch索引的正确名称。
通过这些调整,您应该能够根据定义的正则表达式模式检索电话号码无效的数据。

相关问题