I am trying to run some experiments with my Python code. The input of my code is based on a DataFrame. To filter my DataFrame I use df.loc
. Before running my code I filter the DataFrame for the instance I want to run my code. I have the following list of instances:
instance = ['A', 'B', 'C', 'D']
(These instances are also contained in a column in my DataFrame named df[Instance]
). When I want to run my code for instance 'A'
only, I first filter my dataframe for instance 'A'
:
df = df.loc[(df['Instance'] == 'A')]
When I want to run my code for instance 'B'
df = df.loc[(df['Instance'] == 'B')]
When I want to run my code for instance 'A'
and 'B'
I do the following:
df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]
Now I want to run my code for all the subsets between 'A', 'B', 'C', 'D'
. I can make subsets with the following function
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1))
subsets = list(powerset(instance))
Giving the following output
[('A',), ('B',), ('C',), ('D',), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('A', 'B', 'C'), ('A', 'B', 'D'), ('A', 'C', 'D'), ('B', 'C', 'D'), ('A', 'B', 'C', 'D')]
Now I want to run my code for all the subsets starting with that it filters the DataFrame for the items in a subset. At the moment, I filter my DataFrames manually. What I want to achieve is that my code runs for every subset. Now I filter every subset by hand using df.loc. Has anyone a tip how to do this automatically?
Expecting:
Iterate through all the subsets.
Run code for A (subset 1)
df = df.loc[(df['Instance'] == 'A')]
Run code for B (subset 2)
df = df.loc[(df['Instance'] == 'C')]
Run code For C (subset 3)
df = df.loc[(df['Instance'] == 'B')]
Run code for D (subset 4)
df = df.loc[(df['Instance'] == 'D')]
Run code for A, B (subset 5)
df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]
Etc.
1条答案
按热度按时间5jdjgkvh1#
我认为您应该使用pandas.Series.apply,
对Series的值调用[s]函数。
它从序列中获取每个值,在这里是
df["Instance"]
,并将其传递给一个函数,该函数只需要检查示例是否为in
,即当前正在处理的subsets
的元素: