pandas 有条件地筛选多个列,否则返回整个 Dataframe

x759pob2  于 2022-12-16  发布在  其他
关注(0)|答案(2)|浏览(143)

我有一个csv文件,有几个人。我想建立一个函数,将过滤器的基础上,所有参数或返回整个 Dataframe ,因为它是如果没有参数传递。
因此,给定csv为:

FirstName    LastName   City
Matt          Fred      Austin
Jim           Jack      NYC
Larry         Bob       Houston
Matt          Spencer   NYC

如果我要调用函数find,假设这是我期望看到的,具体取决于我传递的参数

find(first="Matt", last="Fred")
Output: Matt   Fred Austin
find()
Output: Full Dataframe
find(last="Spencer")
Output: Matt Spencer Fred
find(address="NYC")
Output: All people living in NYC in dataframe

这是我曾经尝试过的:

def find(first=None, last=None, city=None):
    file= pd.read_csv(list)
    searched = file.loc[(file["FirstName"] == first) & (file["LastName" == last]) & (file["City"] == city)]
    return searched

如果只传入名字而不传入其他内容,则返回空值

rbl8hiat

rbl8hiat1#

你可以这样做:

import numpy as np

def find(**kwargs):
    assert np.isin(list(kwargs.keys()), df.columns).all()
    return df.loc[df[list(kwargs.keys())].eq(list(kwargs.values())).all(axis=1)]

search = find(FirstName="Matt", LastName="Fred")
print(search)

#  FirstName LastName    City
#0      Matt     Fred  Austin

find(LastName="Spencer")

#   FirstName     LastName   City
#3       Matt      Spencer    NYC

如果要使用"first""last""city"

def find(**kwargs):
    
    df_index = df.rename(columns={"FirstName": "first",
                                  "LastName": "last", 
                                  "City": "city"})
    assert np.isin(list(kwargs.keys()), df_index.columns).all()
    
    return df.loc[df_index[list(kwargs.keys())]
                    .eq(list(kwargs.values())).all(axis=1)]
xxb16uws

xxb16uws2#

过滤列的另一种替代方法:

csv_path = os.path.abspath('test.csv')
df = pd.read_table(csv_path, sep='\s+')

def find_by_attrs(df, **attrs):
    if attrs.keys() - df.columns:
        raise KeyError('Improper column name(s)')
    return df[df[attrs.keys()].eq(attrs.values()).all(1)]

print(find_by_attrs(df, City="NYC"))

输出:

FirstName LastName City
1       Jim     Jack  NYC
3      Matt  Spencer  NYC

相关问题