使用pandas dataframe从一列中提取特殊字符(：)并填充相同的行值，直到出现下一个字符

0md85ypi 于 2023-04-19 发布在其他

关注(0)|答案(3)|浏览(125)

使用pandas dataframe a我想从一列中提取特殊字符文本& ffill（forward fill），以填充相同的值，直到下一次出现特殊字符，提取后删除特殊字符行。我已经尝试了以下方法，但没有得到我想要的结果。
输入 Dataframe ：

import pandas as pd
df = pd.DataFrame({
    'col1': ['White color :', 'I am not really sure how to do this', 'I am not really sure how to do this', 
             'Black color :', 'I am not ready to solve your issue',
           'I am not ready to solve your issue','I am not ready to solve your issue'],
    
     })

df['new_col'] = df['col1'].str.extract('^([^:]+)', expand=False)
mask = df.apply(lambda x: x.str.contains(':')).any(axis=1)
df.loc[mask, :] = df.loc[mask, :].ffill(axis=1)
df

所需输出 Dataframe

col1                                       new_col   
 0  I am not really sure how to do this        White color
 1  I am not really sure how to do this        White color
 2  I am not ready to solve your issue         Black color 
 3  I am not ready to solve your issue         Black color 
 4  I am not ready to solve your issue         Black color

pandas

来源：https://stackoverflow.com/questions/75975282/using-pandas-dataframe-extract-special-character-from-one-column-and-fill-sam

3条答案

按热度按时间

s2j5cfk01#

遵循您的方法：

m = df["col1"].str.contains(":", na=False)

out = (df.assign(new_col=df["col1"].str.extract(r"^([^:]+)", expand=False)
                 .where(m).ffill()).loc[~m]).reset_index(drop=True)

输出：

print(out)

                                  col1       new_col
0  I am not really sure how to do this  White color 
1  I am not really sure how to do this  White color 
2   I am not ready to solve your issue  Black color 
3   I am not ready to solve your issue  Black color 
4   I am not ready to solve your issue  Black color

赞(0）回复(0）举报 2023-04-19

vjhs03f72#

您可以从col1中提取颜色，并将其复制到new_col中，不带冒号，向前填充（这是因为我们只复制col1值中有:的位置）。然后您可以简单地删除col1中有颜色的行：

df['new_col'] = df['col1'].str.extract(r'^(.*)\s+:').ffill()
df = df[~df['col1'].str.contains(':')].reset_index(drop=True)

输出：

col1      new_col
0  I am not really sure how to do this  White color
1  I am not really sure how to do this  White color
2   I am not ready to solve your issue  Black color
3   I am not ready to solve your issue  Black color
4   I am not ready to solve your issue  Black color

赞(0）回复(0）举报 2023-04-19

bxfogqkk3#

这是一种buteforce，但它的工作就像一个魅力。

import pandas as pd
df = pd.DataFrame({
    'col1': ['White color :', 'I am not really sure how to do this', 'I am not really sure how to do this', 
             'Black color :', 'I am not ready to solve your issue',
           'I am not ready to solve your issue','I am not ready to solve your issue'],
    
     })

special_character_set = ["White color :", "Black color :"]
special_char = ""
special = []
temp = []
for idx, row in df.iterrows():
    if row["col1"] in special_character_set:
        special_char = row["col1"][:-2]
    else:
        temp.append(row["col1"])
        special.append(special_char)    

final_df = pd.DataFrame({
    "col1" : temp,
    "new_col": special
})

赞(0）回复(0）举报 2023-04-19

我来回答

使用pandas dataframe从一列中提取特殊字符(：)并填充相同的行值，直到出现下一个字符

3条答案

相关问题

热门标签

最新问答