elasticsearch 在python中替换字符串时获取黑斜杠

bsxbgnwa  于 2023-01-20  发布在  ElasticSearch
关注(0)|答案(1)|浏览(137)

我正在使用python的.replace函数将一个字符串替换为一个字符串。

type = ['B','A','C']
q = ''
for i in prov:
    s = str({'filter' : {'match_phrase': {'type':i}}})
    s = s[1:-1]  
    q = q+','+s
Now q looks like this
Output - ",'filter': {'match_phrase': {'type': 'B'}},'filter': {'match_phrase': {'type': 'A'}},'filter': {'match_phrase': {'type': 'C'}}"

现在我想在查询中用“q”字符串替换“type_filter”字符串(如下所示)。

query = r"""{"size": 10 ,"query": {"bool": {"must": [{"multi_match": {"query": "centro","fields": ["name","alias_terms"],"fuzziness": "AUTO"}}],"filter": {"match_phrase": {"category": "Specialty"}} type_filter }}}"""

当我使用下面的替换函数时,我得到了反斜杠()

c = spec_query.replace("type_filter",q)
c looks like this
'{"size": 10 ,"query": {"bool": {"must": [{"multi_match": {"query": "centro","fields": ["name","alias_terms"],"fuzziness": "AUTO"}}],"filter": {"match_phrase": {"category": "Specialty"}},\'filter\': {\'match_phrase\': {\'prov_type\': \'B\'}},\'filter\': {\'match_phrase\': {\'prov_type\': \'A\'}},\'filter\': {\'match_phrase\': {\'prov_type\': \'C\'}}}}}'

我试过json.dumps和正则表达式来删除反斜杠。我是这样查询的

response = opensearc_client.search(body = c,index ,='proc_spanish')

生成的JSON应该如下所示,这样我就可以在开放查询中运行它,并且它的格式应该是json。

c = {"size": 10 ,"query": {"bool": {"must": [{"multi_match": {"query": "centro","fields": ["name","alias_terms"],"fuzziness": "AUTO"}}],"filter": {"match_phrase": {"category": "Specialty"}},"filter": {"match_phrase": {"type": "B"}},"filter": {"match_phrase": {"type": "A"}},"filter": {"match_phrase": {"type": "C"}}}}}

在使用json.loads和json.dumps函数后运行此查询时,我得到的错误是

RequestError: RequestError(400, 'json_parse_exception', "Unexpected character (''' (code 39)): was expecting double-quote to start field name\n at [Source: (org.opensearch.common.io.stream.InputStreamStreamInput); line: 1, column: 188]")

任何人都可以请帮助如何删除这些额外的反斜杠。提前感谢。

p4rjhz4m

p4rjhz4m1#

这个问题最简单的再现是:
""""a""".replace("a","'b'")
此语句的输出为:
'"\'b\''
Python添加了反斜杠以防止引号混淆,从而导致字符串不可用。如果Python用'b'替换了"a中的a,它将变成"'b'。这里的问题是,它只能用以下两种方式之一表示:

  1. '"'b'',但在这里python将其视为三个独立的词位'"'b''
  2. ""'b'",在这里Python也将其视为三个独立的词位:""'b'"
    因此replace方法创建了三个独立的词位,这可能会导致字符串表示的复杂化,为了防止这种情况,python隐式地添加了转义字符(\)。
    现在回答原来的问题:
q = ",'filter': {'match_phrase': {'type': 'B'}},'filter': {'match_phrase': {'type': 'A'}},'filter': {'match_phrase': {'type': 'C'}}"
query = r"""{"size": 10 ,"query": {"bool": {"must": [{"multi_match": {"query": "centro","fields": ["name","alias_terms"],"fuzziness": "AUTO"}}],"filter": {"match_phrase": {"category": "Specialty"}} type_filter }}}"""

这里,在q中,使用单引号,而在query中,使用双引号。为了防止两者混淆,只需将单引号替换为双引号,反之亦然。

query.replace("type_filter",q.replace("'",'"'))

输出如下所示:

'{"size": 10 ,"query": {"bool": {"must": [{"multi_match": {"query": "centro","fields": ["name","alias_terms"],"fuzziness": "AUTO"}}],"filter": {"match_phrase": {"category": "Specialty"}} ,"filter": {"match_phrase": {"type": "B"}},"filter": {"match_phrase": {"type": "A"}},"filter": {"match_phrase": {"type": "C"}} }}}'

相关问题