numpy 用np.where()条件填充pandas DF列-没有where时有效,没有where时无效

bwntbbo3  于 2023-10-19  发布在  其他
关注(0)|答案(1)|浏览(120)

我有一份电子邮件收件人的名单-其中一半都是

"Firstname Lastname" <[email protected]>

一半的设计风格

firstnam[email protected] [some text goes here asdfsdafasgawgaw etc]

我将数据放在一个单独的DF中,split()来自一个不同的数据集。
当我做

df_task_email['contents'] = df_email_contents.iloc[:,0]

它只是工作(复制电子邮件的内容到列)
当我做

df_task_email['to_name'] = np.where(
    df_split3.iloc[:,0].str.startswith('\"'),

    #True - copy contents of cell across
    df_split3.iloc[:,0].str.replace("\"","").replace(np.nan,'',regex=True),

    #False - extract the first email address - "pattern" is the regex for an email 
    df_split3.iloc[:,0].str.extract(pattern)[0].replace(np.nan,'',regex=True)
)

我得到

> ValueError: Length of values (37099) does not match length of index (2010634)

(If索引对于np来说太长了。where(),为什么对于“="来说不太长呢?)
作为一个测试,如果我删除np.where(),

df_task_email['to_name'] = df_split3.iloc[:,0].str.replace("\"","").replace(np.nan,'',regex=True),

然后它再次工作(尽管它没有像我期望的那样处理数据

2guxujil

2guxujil1#

也许你可以使用.str.extract()contents列(regex demo)中获取电子邮件:
考虑这个dataframe:

contents
0                   "Firstname1 Lastname2" <[email protected]>
1  [email protected] [some text goes here asdfsdafasgawgaw etc]
2                   "Firstname3 Lastname4" <[email protected]>

然后又道:

df["email"] = df["contents"].str.extract(r"\" <([^>]+)>|(^\S+) \[").bfill(axis=1)[0]
print(df)

图纸:

contents                          email
0                   "Firstname1 Lastname2" <[email protected]>  [email protected]
1  [email protected] [some text goes here asdfsdafasgawgaw etc]   [email protected]
2                   "Firstname3 Lastname4" <f[email protected]>  [email protected]

相关问题