regex 基于某些键值对从字符串中提取值

q3aa0525 于 2023-06-25 发布在其他

关注(0)|答案(1)|浏览(108)

我有一些数据，我拉从JIRA的数据在下面的格式。

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz  Text abc'}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]

comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]

我想提取所有的数据，有文本的键值，也concat，如果有多个这样的值在同一行。下面给出的是预期输出
预期输出：

In conversation with the customer @Kev Text 123

@xyz  Text abc

@Hey FYI

Output: New Text goes here

regex

来源：https://stackoverflow.com/questions/76376097/extract-value-from-a-string-based-on-certain-key-value-pairs

1条答案

按热度按时间

myzjeezk1#

假设你的json值可能非常复杂，并且彼此之间存在很大差异，你可以使用一个积极的lookbehind来发现你需要的字符串：

(?<='text': ['\"]).*?(?=['\"][},])

正则表达式说明：

(?<='text': ['\"])：匹配'text'和单引号或双引号的正向后查找
.*?：任意数量的字符，以惰性方式匹配
(?=['\"][},])：后跟单引号或双引号，以及逗号或右大括号。

你的Python代码应该是这样的：

import pandas as pd
import re

comments_df = pd.DataFrame([
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz  Text abc'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]''',
    '''[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]'''
], columns=['string'])

pattern = r'''(?<='text': ['\"]).*?(?=['\"][},])'''
print(comments_df['string'].str.findall(pattern).str.join(''))

输出：

0    In conversation with the customer @Kev Text 123
1                                     @xyz  Text abc
2                                           @Hey FYI
3                         Output: New Text goes here
Name: string, dtype: object

检查Regex demo和Python demo。

赞(0）回复(0）举报 2023-06-25

我来回答

regex 基于某些键值对从字符串中提取值

1条答案

相关问题

热门标签

最新问答