pandas 在填写中添加前缀,以识别结转的值

oprakyz7  于 2022-12-09  发布在  其他
关注(0)|答案(3)|浏览(167)

当用Pandas的填充词来填充na时,有没有一种方法可以添加前缀?我有一个包含分类学信息的数据框架,如下所示:

| Kingdom  | Phylum        | Class       | Order           | Family           | Genus         |

| Bacteria | Firmicutes    | Bacilli     | Lactobacillales | Lactobacillaceae | Lactobacillus |

| Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales   |                  |               |

| Bacteria | Bacteroidetes |             |                 |                  |               |

由于不是所有的分类群都能被完全分类,所以我有一些空的单元格。用NA替换空格,然后使用ffill,我可以用每一行中最后一个有效的字符串填充这些单元格,但是我想在这些单元格中添加一个字符串(例如“Unknown_Bacteroidales”),这样我就可以识别哪些单元格被结转。
到目前为止,我尝试了taxa_formatted = "unknown_" + taxonomy.fillna(method='ffill', axis=1),但这当然会将“unknown_”前缀添加到 Dataframe 中的所有内容。

8tntrjer

8tntrjer1#

您可以将boolean maskingdf.isna配合使用。

df = df.replace("", np.nan)  # if already NaN present skip this step
d = df.ffill()

d[df.isna()]+="(Copy)"
d
    Kingdom         Phylum              Class                Order                  Family                Genus
0  Bacteria     Firmicutes            Bacilli      Lactobacillales        Lactobacillaceae        Lactobacillus
1  Bacteria  Bacteroidetes        Bacteroidia        Bacteroidales  Lactobacillaceae(Copy)  Lactobacillus(Copy)
2  Bacteria  Bacteroidetes  Bacteroidia(Copy)  Bacteroidales(Copy)  Lactobacillaceae(Copy)  Lactobacillus(Copy)

您可以在此处使用df.add

d = df.ffill(axis=1)
df.add("unkown_" + d[df.isna()],fill_value='')

    Kingdom         Phylum                 Class                 Order                Family                 Genus
0  Bacteria     Firmicutes               Bacilli       Lactobacillales      Lactobacillaceae         Lactobacillus
1  Bacteria  Bacteroidetes           Bacteroidia         Bacteroidales  unkown_Bacteroidales  unkown_Bacteroidales
2  Bacteria  Bacteroidetes  unkown_Bacteroidetes  unkown_Bacteroidetes  unkown_Bacteroidetes  unkown_Bacteroidetes
fdx2calv

fdx2calv2#

您需要使用maskupdate

#make true nan's first.
#df = df.replace('',np.nan)

s = df.isnull()
df = df.ffill(axis=1)

df.update('unknown_' + df.mask(~s) )

print(df)

   Bacteria     Firmicutes                Bacilli        Lactobacillales  \
0  Bacteria  Bacteroidetes            Bacteroidia          Bacteroidales   
1  Bacteria  Bacteroidetes  unknown_Bacteroidetes  unknown_Bacteroidetes   

        Lactobacillaceae          Lactobacillus  
0  unknown_Bacteroidales  unknown_Bacteroidales  
1  unknown_Bacteroidetes  unknown_Bacteroidetes
kwvwclae

kwvwclae3#

df = df.replace("", np.nan)  # if already NaN present skip this step
d = df.ffill()

#you may use this 
d[df.isna()]+="(Copy)"
d
    Kingdom         Phylum              Class                Order                  Family                Genus
0  Bacteria     Firmicutes            Bacilli      Lactobacillales        Lactobacillaceae        Lactobacillus
1  Bacteria  Bacteroidetes        Bacteroidia        Bacteroidales  Lactobacillaceae(Copy)  Lactobacillus(Copy)
2  Bacteria  Bacteroidetes  Bacteroidia(Copy)  Bacteroidales(Copy)  Lactobacillaceae(Copy)  Lactobacillus(Copy)

相关问题