pandas 如何将由mongo ObjectId列表构建的字符串转换为仅包含id的python列表

z8dt9xmd  于 11个月前  发布在  Go
关注(0)|答案(4)|浏览(110)

我有一个包含ObjectId列表的字符串表示的列的框架。即:

"[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"

字符串
我想把它从字符串字面量转换成一个python列表,就像这样:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']


json.loadsast.literal_eval都失败,因为字符串包含ObjectId

owfi6suc

owfi6suc1#

我分享这个正则表达式:https://regex101.com/r/m5rW2q/1
例如,您可以点击codegenerator:

import re

regex = r"ObjectId\('(\w+)'\)"

test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

字符串
产出:

Match 1 was found at 1-37: ObjectId('5d28938629fe749c7c12b6e3')
Group 1 found at 11-35: 5d28938629fe749c7c12b6e3
Match 2 was found at 39-75: ObjectId('5caf4522a30528e3458b4579')
Group 1 found at 49-73: 5caf4522a30528e3458b4579


举个例子:

import re 
regex = r"ObjectId\('(\w+)'\)" 

test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]" 

matches = re.finditer(regex, test_str, re.MULTILINE) 
[i.groups()[0] for i in matches]


产出:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']


关于正则表达式的所有信息都可以在这里找到:https://docs.python.org/3/library/re.html

wooyq4lh

wooyq4lh2#

你可以用replace

a = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
a.replace('ObjectId(', '').replace(")","")
#Output:
"['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']"

字符串

jgovgodb

jgovgodb3#

找到行;在'处拆分;从列表中选择项目1和3:

my_df.loc[my_df["my_column"].str.contains("ObjectId"),"my_column"].str.split("'")[0][1:4:2]

字符串
精确给出两个元素的列表:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']

omqzjyyz

omqzjyyz4#

可以使用列表解析

list_of_str = [str(id) for id in list_of_ids]

字符串
这将给予我们预期的结果。

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']

相关问题