检查dataframe中的列列表是否为pandas

bsxbgnwa  于 2023-04-19  发布在  其他
关注(0)|答案(4)|浏览(130)

我有一个 Dataframe 如下所示。

Unit_ID     Type      Sector       Plot_Number       Rental
1           Home      se1          22                50
2           Shop      se1          26                80

从上面我需要写函数来检查是否列列表如下所示的是在 Dataframe 。
如果列表为['Unit_ID', 'Sector', 'Usage_Type', 'Price' ]
预期输出:列“Usage_Type”和“Price”不在 Dataframe 中。
如果列表为['Unit_ID', 'Sector' , 'Type', 'Plot_Number' ]
预期输出:列表中的所有库都在数据框中

snvhrwxg

snvhrwxg1#

您可以尝试使用下面的:

#For checking if the list of columns are actually 
#a subset of the dataframe columns or not , you can use:

def myf1(x,to_check):
    if not set(to_check).issubset(set(x.columns)):
       return f"{' and '.join(set(to_check).difference(x.columns))} are not available in the dataframe"
    return "All columns are available in the dataframe"
to_check = ['Unit_ID', 'Sector'] 
myf1(df,to_check)
#'All columns are available in the dataframe'

to_check = ['Unit_ID', 'Sector','XYZ'] 
myf1(df,to_check)    
#'XYZ are not available in the dataframe'
rdrgkggo

rdrgkggo2#

列名列表可通过以下方式查找:

columns = list(my_dataframe)

现在,您可以遍历搜索列表并检查每个元素是否存在于columns列表中。

def search_func(to_check, columns):
    not_present = []

    for i in to_check:
        if i not in columns:
            not_present.append(i)
    return not_present

to_check = ['Unit_ID', 'Sector',  'Usage_Type', 'Price' ]
not_present = search_func(to_check, columns)
if len(not_present) == 0:
    print(" All coulmns are in the dataframe")
else: 
    print (not_present, "not present in dataframe")
rm5edbpk

rm5edbpk3#

为什么不干脆

def has_columns(cols: List[str], df:pd.DataFrame) -> bool:
    try:
        columns = df[cols]
    except KeyError as e:
        print(f'Missing columns: {e}')
        return False
    print(f'All columns {cols} in dataframe!')
    return True
fhity93d

fhity93d4#

main_df = pd.DataFrame(data={'A':[1],'C':[2],'D':[3]})
print(main_df)
check_cols_list = ['B','C']
check_cols_df = pd.DataFrame(columns=check_cols_list)
print("Names of the check_cols_list present in the main_df columns are:")
print(check_cols_df.columns[check_cols_df.columns.isin(main_df.columns)])
print("Names of the check_cols_list not present in the main_df columns are:")
print(check_cols_df.columns[~check_cols_df.columns.isin(main_df.columns)])

当前输出:

A  C  D
0  1  2  3
Names of the check_cols_list present in the main_df columns are:
Index(['C'], dtype='object')
Names of the check_cols_list not present in the main_df columns are:
Index(['B'], dtype='object')

相关问题