使用Painless脚本更新Elasticsearch字段时出现BadRequestError

zc0qhyus  于 2022-12-17  发布在  ElasticSearch
关注(0)|答案(1)|浏览(137)

我正在尝试使用Painless脚本更新Elasticsearch中的一个字符串字段,以从另一个字段提取正则表达式。这是从Python调用的,例如:

es.update_by_query(index='testrss', query=qry, script=scr)

在我的示例中,qry过滤器只返回1条具有以下值的记录:
{'body_text': "Purpose prong invitations Homely wine pocketses\nSOURCE: THE NY TIMES, NEW YORK\nReaches stealing jambags Azog pull ask" }
我想将THE NY TIMES, NEW YORK提取到一个新字段testxy中。
要使用有效的scr输入示例进行测试:下面的方法可以很好地工作:

scr = {
    "lang": "painless",
    "source": "ctx._source.testxy = /[aeiou]/.matcher(ctx._source.body_text).replaceAll('')"
}

..将testxy更新为:

{
...
 '_source': {'testxy': 'Prps prng nvttns Hmly wn pcktss\nSOURCE: THE NY TIMES, NEW YORK\nRchs stlng jmbgs Azg pll sk',
...
}

但是,正则表达式字符串提取失败:

scr = {
    "lang": "painless",
    "source": "ctx._source.testxy = /SOURCE.*?\n/.matcher(ctx._source.body_text).group(1)"
}

错误:

---------------------------------------------------------------------------
BadRequestError                           Traceback (most recent call last)
/var/folders/8l/d9m87qtx2yn1bc86txmr30wh0000gn/T/ipykernel_57473/2559631365.py in <module>
----> 1 es.update_by_query(index='testrss', query=qry, script=scr)

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py in wrapped(*args, **kwargs)
    412                         pass
    413 
--> 414             return api(*args, **kwargs)
    415 
    416         return wrapped  # type: ignore[return-value]

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py in update_by_query(self, index, allow_no_indices, analyze_wildcard, analyzer, conflicts, default_operator, df, error_trace, expand_wildcards, filter_path, from_, human, ignore_unavailable, lenient, max_docs, pipeline, preference, pretty, query, refresh, request_cache, requests_per_second, routing, script, scroll, scroll_size, search_timeout, search_type, slice, slices, sort, stats, terminate_after, timeout, version, version_type, wait_for_active_shards, wait_for_completion)
   4715         if __body is not None:
   4716             __headers["content-type"] = "application/json"
-> 4717         return self.perform_request(  # type: ignore[return-value]
   4718             "POST", __path, params=__query, headers=__headers, body=__body
   4719         )

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/_base.py in perform_request(self, method, path, params, headers, body)
    319                     pass
    320 
--> 321             raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
    322                 message=message, meta=meta, body=resp_body
    323             )

BadRequestError: BadRequestError(400, 'script_exception', 'compile error')

我也试过:

scr = {
    "lang": "painless",
    "source": "Pattern p = Pattern.compile(\"SOURCE\"); Matcher m = p.matcher(ctx._source.body_text); ctx._source.testxy = m.group(1)"
}

......也失败了。知道我做错了什么吗?

**编辑。**在开发工具控制台中运行此命令时出错:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "java.base/java.util.regex.Matcher.group(Matcher.java:644)",
          "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
          "                                                            ^---- HERE"
        ],
        "script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
        "lang" : "painless",
        "position" : {
          "offset" : 60,
          "start" : 0,
          "end" : 69
        }
      }
    ],
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "java.base/java.util.regex.Matcher.group(Matcher.java:644)",
      "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
      "                                                            ^---- HERE"
    ],
    "script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
    "lang" : "painless",
    "position" : {
      "offset" : 60,
      "start" : 0,
      "end" : 69
    },
    "caused_by" : {
      "type" : "illegal_state_exception",
      "reason" : "No match found"
    }
  },
  "status" : 400
}

令人困惑的。No match found,但我可以删除目标文本与/SOURCE.*?\\n/.matcher(ctx._source.body_text).replaceAll('')

zfycwa2u

zfycwa2u1#

找到解决方案here。在调用.group()之前,必须调用matcher.find()matcher.matches()。谁知道为什么呢?

scr = {
    "lang": "painless",
    "source": "Matcher m = /(?<=SOURCE:).*?(?=\\n)/.matcher(ctx._source.body_text); boolean b = m.find(); ctx._source.testxy = m.group(0)"
}

相关问题