如何快速地将Pandasdf中的组转换成一个独立数组的列表?

jjjwad0x  于 2023-02-02  发布在  其他
关注(0)|答案(1)|浏览(131)

我创建了这个函数,将Pandas Dataframe 中的组转换为一个单独的数组列表:

def convertPandaGroupstoArrays(df):

    # convert each group to arrays in a list.
    groups = df['grouping_var'].unique()
    mySeries = []
    namesofmyseries = []

    for group in groups:
        #print(group)

        single_ts = df[df['grouping_var'] == group]

        ts_name = single_ts['grouping_var'].unique()
        ts_name = ts_name[0]
        namesofmyseries.append(ts_name)

        single_ts = single_ts[['time_series', 'value']]
        #set the time columns as index
        single_ts.set_index('time_series', inplace=True)

        single_ts.sort_index(inplace=True)
        mySeries.append(single_ts)

    return mySeries, namesofmyseries

然而,我的 Dataframe 包含8000万行(许多组,每组包含400行)。我整个上午都在运行这个函数,只有500万行,它似乎永远不会结束。有没有更快的方法来做到这一点?谢谢!

7jmck4yq

7jmck4yq1#

您可以使用groupby

def convertPandaGroupstoArrays(df):
    df1 = df.set_index('time_series')[['value']]
    return list(zip(*df1.groupby(df['grouping_var'])))[::-1]

1M行的性能:

# Your version
244 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Groupby version
62.3 ms ± 487 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

相关问题