pandas 更有效地过滤我的 Dataframe

vxqlmq5t  于 2023-03-16  发布在  其他
关注(0)|答案(2)|浏览(150)

我有下面的数据框,我基本上需要找到所有没有A或A,U的组,并将其保存到Excel
| 姓名|组1|组2|组3|第四组|第五组|第六组|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 应用程序用户||||A类|||
| 共享用户|A类||A类||A、U||
| 媒体用户||||A类|||
| 网络用户|||||||
| 打印用户|A、U||A类||A、U||
目前这是我所拥有的,它的工作很好,但我想知道是否有一个更清洁的方式亲的会这样做?
我当前的工作代码:

import pandas as pd
from pathlib import Path

#Source file
File = Path.cwd() /"./UserGrid.xlsx"

#Read excel file
df = pd.read_excel(File)

#Replace A,U with A
df2 = df.replace(('A,U'), 'A')

#Change Index
df3 = df2.set_index('Name')

#Remove all groups that has an admin
df4=df3.columns[df3.ne('A').all()].tolist()
df5=pd.DataFrame(df4, columns =['No Admins'])

#Save to Excel
dfexcel = pd.DataFrame(df5)
writer = pd.ExcelWriter('./No_Admins.xlsx', engine='xlsxwriter')
dfexcel.to_excel(writer, sheet_name='NoAdmins', index=False)
col_idx = dfexcel.columns.get_loc('No Admins')
writer.sheets['NoAdmins'].set_column(col_idx, col_idx, 50)
writer.close()

我在过滤A和A,U时遇到了麻烦,所以我最终在数据框中用A替换了所有的A,U。只是检查一下是否有更有效的方法,或者我应该像这样离开它。

avwztpqn

avwztpqn1#

选项1

您可以使用fillna将缺失的值插补为空字符串,对"A""A,U"值、全为空的dropna列应用mask,获得columns,然后转换to_frame

from io import StringIO

import pandas as pd

df = pd.read_csv(
    filepath_or_buffer=StringIO(
        """Name Group1  Group2  Group3  Group4  Group5  Group6
AppUser             A       
ShareUser   A       A       A,U 
MediaUser               A       
WebUser                     
PrintUser   A,U     A       A,U """
    ),
    sep="\t",
    index_col="Name"
)

out = (
    df.fillna("")
    .mask(df.isin(["A", "A,U"]))
    .dropna(axis="columns")
    .columns
    .to_frame(name="No Admins")
)

print(out)
No Admins
Group2    Group2
Group6    Group6

选项2

使用isin检查哪些单元格具有"A""A,U",使用eq将值翻转为False,检查哪些列是allTrue,过滤掉False,获取index并转换to_frame
一个二个一个一个

注:将out导出到excel并忽略索引将产生如下所示的帧:

with pd.ExcelWriter("out.xlsx") as writer:
    out.to_excel(writer, index=False)

agyaoht7

agyaoht72#

使用www.example.com怎么样np.select?

import pandas as pd
import numpy as np

df = pd.DataFrame({'Group1': [np.nan, 'A', np.nan, np.nan, 'A, U'], 'Group2': [np.nan, np.nan, np.nan, np.nan, np.nan],
                   'Group3': [np.nan, 'A', np.nan, np.nan, 'A'], 'Group4': [np.nan, 'A', np.nan, 'A', np.nan],
                   'Group5': [np.nan, 'A, U', np.nan, np.nan, 'A, U']})


condition = [(df['Group1'] == 'A, U') | (df['Group1'] == 'A') | (df['Group2'] == 'A, U') | (df['Group2'] == 'A') |
             (df['Group3'] == 'A, U') | (df['Group3'] == 'A')
             | (df['Group4'] == 'A, U') | (df['Group4'] == 'A') | (df['Group5'] == 'A, U') | (df['Group5'] == 'A')]

choices = [True]

df['is_AU'] = np.select(condition, choices)
df = df.loc[df['is_AU'] == 0]
df = df.drop(columns='is_AU')

相关问题