pandas 从使用df.loc过滤数据开始,为每个子集运行代码

xt0899hw  于 2022-12-10  发布在  其他
关注(0)|答案(1)|浏览(161)

I am trying to run some experiments with my Python code. The input of my code is based on a DataFrame. To filter my DataFrame I use df.loc . Before running my code I filter the DataFrame for the instance I want to run my code. I have the following list of instances:

instance = ['A', 'B', 'C', 'D']

(These instances are also contained in a column in my DataFrame named df[Instance] ). When I want to run my code for instance 'A' only, I first filter my dataframe for instance 'A' :

df = df.loc[(df['Instance'] == 'A')]

When I want to run my code for instance 'B'

df = df.loc[(df['Instance'] == 'B')]

When I want to run my code for instance 'A' and 'B' I do the following:

df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]

Now I want to run my code for all the subsets between 'A', 'B', 'C', 'D' . I can make subsets with the following function

from itertools import chain, combinations

def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1))

subsets = list(powerset(instance))

Giving the following output

[('A',), ('B',), ('C',), ('D',), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('A', 'B', 'C'), ('A', 'B', 'D'), ('A', 'C', 'D'), ('B', 'C', 'D'), ('A', 'B', 'C', 'D')]

Now I want to run my code for all the subsets starting with that it filters the DataFrame for the items in a subset. At the moment, I filter my DataFrames manually. What I want to achieve is that my code runs for every subset. Now I filter every subset by hand using df.loc. Has anyone a tip how to do this automatically?
Expecting:
Iterate through all the subsets.
Run code for A (subset 1)

df = df.loc[(df['Instance'] == 'A')]

Run code for B (subset 2)

df = df.loc[(df['Instance'] == 'C')]

Run code For C (subset 3)

df = df.loc[(df['Instance'] == 'B')]

Run code for D (subset 4)

df = df.loc[(df['Instance'] == 'D')]

Run code for A, B (subset 5)

df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]

Etc.

5jdjgkvh

5jdjgkvh1#

我认为您应该使用pandas.Series.apply
对Series的值调用[s]函数。
它从序列中获取每个值,在这里是df["Instance"],并将其传递给一个函数,该函数只需要检查示例是否为in,即当前正在处理的subsets的元素:

for subset in subsets:
    selected_rows = df["Instance"].apply(lambda i: i in subset)
    # do things with selected rows

相关问题