我正在尝试使用Painless脚本更新Elasticsearch中的一个字符串字段,以从另一个字段提取正则表达式。这是从Python调用的,例如:
es.update_by_query(index='testrss', query=qry, script=scr)
在我的示例中,qry
过滤器只返回1条具有以下值的记录:{'body_text': "Purpose prong invitations Homely wine pocketses\nSOURCE: THE NY TIMES, NEW YORK\nReaches stealing jambags Azog pull ask" }
我想将THE NY TIMES, NEW YORK
提取到一个新字段testxy
中。
要使用有效的scr
输入示例进行测试:下面的方法可以很好地工作:
scr = {
"lang": "painless",
"source": "ctx._source.testxy = /[aeiou]/.matcher(ctx._source.body_text).replaceAll('')"
}
..将testxy
更新为:
{
...
'_source': {'testxy': 'Prps prng nvttns Hmly wn pcktss\nSOURCE: THE NY TIMES, NEW YORK\nRchs stlng jmbgs Azg pll sk',
...
}
但是,正则表达式字符串提取失败:
scr = {
"lang": "painless",
"source": "ctx._source.testxy = /SOURCE.*?\n/.matcher(ctx._source.body_text).group(1)"
}
错误:
---------------------------------------------------------------------------
BadRequestError Traceback (most recent call last)
/var/folders/8l/d9m87qtx2yn1bc86txmr30wh0000gn/T/ipykernel_57473/2559631365.py in <module>
----> 1 es.update_by_query(index='testrss', query=qry, script=scr)
/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py in wrapped(*args, **kwargs)
412 pass
413
--> 414 return api(*args, **kwargs)
415
416 return wrapped # type: ignore[return-value]
/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py in update_by_query(self, index, allow_no_indices, analyze_wildcard, analyzer, conflicts, default_operator, df, error_trace, expand_wildcards, filter_path, from_, human, ignore_unavailable, lenient, max_docs, pipeline, preference, pretty, query, refresh, request_cache, requests_per_second, routing, script, scroll, scroll_size, search_timeout, search_type, slice, slices, sort, stats, terminate_after, timeout, version, version_type, wait_for_active_shards, wait_for_completion)
4715 if __body is not None:
4716 __headers["content-type"] = "application/json"
-> 4717 return self.perform_request( # type: ignore[return-value]
4718 "POST", __path, params=__query, headers=__headers, body=__body
4719 )
/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/_base.py in perform_request(self, method, path, params, headers, body)
319 pass
320
--> 321 raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
322 message=message, meta=meta, body=resp_body
323 )
BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
我也试过:
scr = {
"lang": "painless",
"source": "Pattern p = Pattern.compile(\"SOURCE\"); Matcher m = p.matcher(ctx._source.body_text); ctx._source.testxy = m.group(1)"
}
......也失败了。知道我做错了什么吗?
**编辑。**在开发工具控制台中运行此命令时出错:
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"java.base/java.util.regex.Matcher.group(Matcher.java:644)",
"ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
" ^---- HERE"
],
"script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
"lang" : "painless",
"position" : {
"offset" : 60,
"start" : 0,
"end" : 69
}
}
],
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"java.base/java.util.regex.Matcher.group(Matcher.java:644)",
"ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
" ^---- HERE"
],
"script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
"lang" : "painless",
"position" : {
"offset" : 60,
"start" : 0,
"end" : 69
},
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "No match found"
}
},
"status" : 400
}
令人困惑的。No match found
,但我可以删除目标文本与/SOURCE.*?\\n/.matcher(ctx._source.body_text).replaceAll('')
。
1条答案
按热度按时间zfycwa2u1#
找到解决方案here。在调用
.group()
之前,必须调用matcher.find()
或matcher.matches()
。谁知道为什么呢?