pandas 如何用列 Dataframe 中的列表替换string中的substring？

juud5qan 于 2023-02-07 发布在其他

关注(0)|答案(4)|浏览(220)

我需要替换dataframe中列值中的子字符串
示例：我在 Dataframe 中有一列"code"（实际上， Dataframe 非常大）

3816R(motor) #I need '3816R'
97224(Eletro)
502812(Defletor)
97252(Defletor)
97525(Eletro)
5725 ( 56)

我用这个列表来替换这些值：

list = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']

我试过很多方法，比如：

df['code'] = df['code'].str.replace(list, '')

和regex = True，但是任何方法都可以移除子字符串。
我该怎么做呢？

pandas

来源：https://stackoverflow.com/questions/75326692/how-can-i-replace-substring-from-string-by-a-list-in-a-column-dataframe

4条答案

按热度按时间

9jyewag01#

您可以尝试regex replace和regex or condition：https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/

l = ['(motor)', '(Eletro)', '(Defletor)', '( 56)']
l = [s.replace('(', '\(').replace(')', '\)') for s in l]
regex_str = f"({'|'.join(l)})"
df['code'] = df['code'].str.replace(regex_str, '', regex=True)

regex_str最终会得到类似于

"(\(motor\)|\(Eletro\)|\(Defletor\)|\( 56\))"

赞(0）回复(0）举报 2023-02-07

3zwjbxry2#

如果您确定任何行和所有行都遵循提供的格式，则可以使用lambda函数尝试执行以下操作：

df['code_clean'] = df['code'].apply(lambda x: x.split('(')[0])

赞(0）回复(0）举报 2023-02-07

wa7juj8i3#

您可以尝试正则表达式匹配方法：https://docs.python.org/3/library/re.html#re.Pattern.match

df['code'] = df['code'].apply(lambda x: re.match(r'^(\w+)\(\w+\)',x).group(1))

正则表达式^(\w+)的第一部分创建一个捕获组，其中包含遇到括号之前的任意字母或数字，然后group(1)提取文本。

赞(0）回复(0）举报 2023-02-07

gstyhher4#

str.replace将处理一个字符串而不是字符串列表。您可能会循环使用它

rmlist = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
for repl in rmlist:
    df['code'] = df['code'].str.replace(repl, '')

或者，如果您的方括号子字符串在末尾..请在“（“处拆分它，并丢弃生成的其他列..肯定会更快

df["code"]=df["code"].str.split(pat="(",n=1,expand=True)[0]

字符串拆分相当快

赞(0）回复(0）举报 2023-02-07

我来回答

pandas 如何用列 Dataframe 中的列表替换string中的substring？

4条答案

相关问题

热门标签

最新问答