减少pandas数据框,使其具有一列重复值列表

4ioopgfo  于 2023-05-27  发布在  其他
关注(0)|答案(2)|浏览(115)

我有以下dataframe:

index   name     path
0       Dina     "gs://my_bucket/folder1/img1.png"
1       Dina     "gs://my_bucket/folder1/img2.png"
2       Lane     "gs://my_bucket/folder1/img3.png"
3       Bari     "gs://my_bucket/folder1/img4.png"
4       Andrew   "gs://my_bucket/folder1/img5.png"
5       Andrew   "gs://my_bucket/folder1/img6.png"
6       Andrew   "gs://my_bucket/folder1/img7.png"
7       Beti     "gs://my_bucket/folder1/img7.png"
8       Ladin    "gs://my_bucket/folder1/img5.png"
...

我想得到新的dataframe这将有唯一的名称只出现一次,路径列将与匹配的路径列表。输出应该如下所示:

index   name     path
0       Dina     ["gs://my_bucket/folder1/img1.png","gs://my_bucket/folder1/img2.png"]
1       Lane     ["gs://my_bucket/folder1/img3.png"]
2       Bari     ["gs://my_bucket/folder1/img4.png"]
3       Andrew   ["gs://my_bucket/folder1/img5.png","gs://my_bucket/folder1/img6.png","gs://my_bucket/folder1/img7.png"]
4       Beti     ["gs://my_bucket/folder1/img7.png"]
5       Ladin    ["gs://my_bucket/folder1/img5.png"]
...

结果的行数应等于 Dataframe 中的唯一名称。目前我正在使用我用chatgpt做的东西,但它使用了我不明白为什么要使用它的函数,而且它重复了行的名称,所以如果我知道我假设有842个唯一的名称,我得到992个。
这是chatGPT解决方案:

# Define a custom aggregation function to combine links as a list
def combine_links(links):
    return list(set(links))  # Convert links to a list and remove duplicates

# Group the GeoDataFrame by 'name' and 'dili' and aggregate the 'link' column
result = df.groupby(['name'))['path'].agg(combine_links).reset_index()

我的目标是找到一个解决方案,最终给我正确的行数,这是唯一名称的数量。

thigvfpy

thigvfpy1#

可能的解决方案:

df.groupby('name')['path'].agg(list).reset_index()

输出:

name                                               path
0  Andrew  [gs://my_bucket/folder1/img5.png, gs://my_buck...
1    Bari                  [gs://my_bucket/folder1/img4.png]
2    Beti                  [gs://my_bucket/folder1/img7.png]
3    Dina  [gs://my_bucket/folder1/img1.png, gs://my_buck...
4   Ladin                  [gs://my_bucket/folder1/img5.png]
5    Lane                  [gs://my_bucket/folder1/img3.png]
mzmfm0qo

mzmfm0qo2#

我想答案就藏在这里:What is the difference between pandas agg and apply function?
我的代码工作:

import pandas as pd

def combine_links(links):
    return list(set(links))

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
                       'Max Speed': [380., 370., 24., 26., 370.]})

df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()

print(df_new)

相关问题