pandas 从使用df.loc过滤数据开始，为每个子集运行代码

xt0899hw 于 2022-12-10 发布在其他

关注(0)|答案(1)|浏览(161)

I am trying to run some experiments with my Python code. The input of my code is based on a DataFrame. To filter my DataFrame I use df.loc . Before running my code I filter the DataFrame for the instance I want to run my code. I have the following list of instances:

instance = ['A', 'B', 'C', 'D']

(These instances are also contained in a column in my DataFrame named df[Instance] ). When I want to run my code for instance 'A' only, I first filter my dataframe for instance 'A' :

df = df.loc[(df['Instance'] == 'A')]

When I want to run my code for instance 'B'

df = df.loc[(df['Instance'] == 'B')]

When I want to run my code for instance 'A' and 'B' I do the following:

df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]

Now I want to run my code for all the subsets between 'A', 'B', 'C', 'D' . I can make subsets with the following function

from itertools import chain, combinations

def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1))

subsets = list(powerset(instance))

Giving the following output

[('A',), ('B',), ('C',), ('D',), ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('A', 'B', 'C'), ('A', 'B', 'D'), ('A', 'C', 'D'), ('B', 'C', 'D'), ('A', 'B', 'C', 'D')]

Now I want to run my code for all the subsets starting with that it filters the DataFrame for the items in a subset. At the moment, I filter my DataFrames manually. What I want to achieve is that my code runs for every subset. Now I filter every subset by hand using df.loc. Has anyone a tip how to do this automatically?
Expecting:
Iterate through all the subsets.
Run code for A (subset 1)

df = df.loc[(df['Instance'] == 'A')]

Run code for B (subset 2)

df = df.loc[(df['Instance'] == 'C')]

Run code For C (subset 3)

df = df.loc[(df['Instance'] == 'B')]

Run code for D (subset 4)

df = df.loc[(df['Instance'] == 'D')]

Run code for A, B (subset 5)

df = df.loc[(df['Instance'] == 'A') | (df['Instance'] == 'B')]

Etc.

pandas

来源：https://stackoverflow.com/questions/74661188/run-code-for-every-subset-starting-by-filtering-data-with-df-loc

1条答案

按热度按时间

5jdjgkvh1#

我认为您应该使用pandas.Series.apply，
对Series的值调用[s]函数。
它从序列中获取每个值，在这里是df["Instance"]，并将其传递给一个函数，该函数只需要检查示例是否为in，即当前正在处理的subsets的元素：

for subset in subsets:
    selected_rows = df["Instance"].apply(lambda i: i in subset)
    # do things with selected rows

赞(0）回复(0）举报 2022-12-10

我来回答

pandas 从使用df.loc过滤数据开始，为每个子集运行代码

1条答案

相关问题

热门标签

最新问答