我有以下dataframe:
index name path
0 Dina "gs://my_bucket/folder1/img1.png"
1 Dina "gs://my_bucket/folder1/img2.png"
2 Lane "gs://my_bucket/folder1/img3.png"
3 Bari "gs://my_bucket/folder1/img4.png"
4 Andrew "gs://my_bucket/folder1/img5.png"
5 Andrew "gs://my_bucket/folder1/img6.png"
6 Andrew "gs://my_bucket/folder1/img7.png"
7 Beti "gs://my_bucket/folder1/img7.png"
8 Ladin "gs://my_bucket/folder1/img5.png"
...
我想得到新的dataframe这将有唯一的名称只出现一次,路径列将与匹配的路径列表。输出应该如下所示:
index name path
0 Dina ["gs://my_bucket/folder1/img1.png","gs://my_bucket/folder1/img2.png"]
1 Lane ["gs://my_bucket/folder1/img3.png"]
2 Bari ["gs://my_bucket/folder1/img4.png"]
3 Andrew ["gs://my_bucket/folder1/img5.png","gs://my_bucket/folder1/img6.png","gs://my_bucket/folder1/img7.png"]
4 Beti ["gs://my_bucket/folder1/img7.png"]
5 Ladin ["gs://my_bucket/folder1/img5.png"]
...
结果的行数应等于 Dataframe 中的唯一名称。目前我正在使用我用chatgpt做的东西,但它使用了我不明白为什么要使用它的函数,而且它重复了行的名称,所以如果我知道我假设有842个唯一的名称,我得到992个。
这是chatGPT解决方案:
# Define a custom aggregation function to combine links as a list
def combine_links(links):
return list(set(links)) # Convert links to a list and remove duplicates
# Group the GeoDataFrame by 'name' and 'dili' and aggregate the 'link' column
result = df.groupby(['name'))['path'].agg(combine_links).reset_index()
我的目标是找到一个解决方案,最终给我正确的行数,这是唯一名称的数量。
2条答案
按热度按时间thigvfpy1#
可能的解决方案:
输出:
mzmfm0qo2#
我想答案就藏在这里:What is the difference between pandas agg and apply function?
我的代码工作: