在Pandas中快速将多列聚合为一列

vof42yt1 于 2023-01-11 发布在其他

关注(0)|答案(1)|浏览(174)

我有一个DataFrame，它最终是附加到各个X和Y坐标的对象ID，类似于
| 识别号|十|是|
| - ------|- ------|- ------|
| 1个|无|无|
| 1个|1个|三个|
| 1个|第二章|五个|
| 第二章|七|1个|
| 第二章|八个|五个|
| 第二章|九|七|
我最终不能保证ID或X/Y的顺序，也不能使这些连接上游。
最终的目标是得到所涉及的点的船体，我目前正在将X/Y分组到一个列表中，然后压缩它们，然后在找到凸包之前将元组列表更改为Shapely MultiPoint。

import shapely.geometry as shGeom
sf = df.groupby("ID").agg({"X": list, "Y": list})
# I want to keep this coordinate set for later, though as the MultiPoint would be fine.
# In tests, storing the MultiPoint as an intermediate is slower due to memory pressure 
# rather than the list-of-tuples
sf["coordinates"] = shapeFrame[["Y", "X"]].apply(lambda x: [(a,b) for a, b in zip(x[0], x[1])], axis= 1)
# This next "hull" column is the target
sf["hull"] = sf["coordinates"].apply(lambda x: shGeom.MultiPoint(x).convex_hull)

然而，该方法需要在1 M+行帧上的若干数据传递，并且特别地，压缩传递是慢的。
有没有一种方法可以用更少的数据传递来做到这一点？感觉应该有。（在一天结束时，这段代码 * 工作 *，但这是一个非常缓慢的步骤）
我后来确实使用了GeoPandas，但在X和Y条目转换为Point或MultiPolygon之前，没有几何列可供操作，这无法绕过缓慢的步骤。

pandas

来源：https://stackoverflow.com/questions/75034449/aggregating-multiple-columns-into-one-in-pandas-quickly

1条答案

按热度按时间

2izufjch1#

可使用scipy函数计算船体

import pandas as pd
from scipy.spatial import ConvexHull

grouped = df.groupby('ID')

def compute_hull(group):
    points = group[['X', 'Y']].values
    hull = ConvexHull(points)
    return {'ID': group.name, 'hull': hull}

convex_hulls = grouped.apply(compute_hull).tolist()

print(convex_hulls)

我在一个创建了100万行的虚拟df中尝试了这个方法，它立即运行。

赞(0）回复(0）举报 2023-01-11

我来回答

在Pandas中快速将多列聚合为一列

1条答案

相关问题

热门标签

最新问答