pandas 如何基于上一组中的第一个匹配项创建新列？

camsedfj 于 2022-12-21 发布在其他

关注(0)|答案(2)|浏览(136)

我有一个如下所示的 Dataframe

id   reg   version
 1   54       1
 2   54       1
 3   54       1
 4   54       2
 5   54       3
 6   54       3
 7   55       1

我们的目标是分配两个新列previous_version和next_version，它们从id中获取值，并填充previous_version和next_version。在上面的示例df中，对于id = 1，由于version = 1，下一个版本从id = 4开始，我将next_version值填充为4，将previous_version值填充为null，因为没有任何值。
如果没有上一个或下一个版本，则应使用null填充。
我能做到的是

获取DF中的唯一版本。
每个版本的第一ID。
以上2.

我正在努力找到一种逻辑来将字典应用于 Dataframe ，以便填充previous_version和next_version列。

versions = df['version'].unique().tolist()
version_ids = df.groupby(['reg', 'version'])['id'].first().tolist()

以下是数据框的外观

id   reg   version   previous_version   next_version
 1   54       1            NULL             4
 2   54       1            NUll             4
 3   54       1            NULL             4
 4   54       2            1                5
 5   54       3            4                NULL
 6   54       3            4                NULL
 7   55       1            NULL             NULL

如果存在n个版本，那么实现该结果的最佳方式是什么？

pandas

来源：https://stackoverflow.com/questions/74830375/how-do-i-create-new-columns-based-on-the-first-occurrence-in-the-previous-group

2条答案

按热度按时间

8cdiaqws1#

您可以执行嵌套groupby，使用Series.shift()，然后合并。

def _prev_and_next_version(sf):
    # Use `Int64` to avoid conversion to float. Not crucial.
    first_id_by_version = sf.groupby('version')['id'].first().astype('Int64')
    prev = first_id_by_version.shift(1).rename('previous_version')
    next_ = first_id_by_version.shift(-1).rename('next_version')
    sf_out = sf.merge(prev, on='version').merge(next_, on='version')
    return sf_out

df.groupby('reg').apply(_prev_and_next_version).set_index(df.index)

结果：

id  reg  version  previous_version  next_version
0   1   54        1              <NA>             4
1   2   54        1              <NA>             4
2   3   54        1              <NA>             4
3   4   54        2                 1             5
4   5   54        3                 4          <NA>
5   6   54        3                 4          <NA>
6   7   55        1              <NA>          <NA>

对于上下文，每次迭代的first_id_by_version：

version
1    1
2    4
3    5
Name: id, dtype: Int64

version
1    7
Name: id, dtype: Int64

赞(0）回复(0）举报 2022-12-21

332nm8kg2#

您可以通过创建一个reg，version作为key和value作为id的组合的Map来实现。您需要使用Map的系列是reg，version + 1和reg，version - 1的组合。

k1 = list(zip(df.reg, df.version.add(1)))
k2 = list(zip(df.reg, df.version.sub(1)))
d = df.drop_duplicates(['reg', 'version'], keep='first').set_index(['reg', 'version'])['id']
df['previous_version'] = pd.Series(k2).map(d).astype('Int64')
df['next_version'] = pd.Series(k1).map(d).astype('Int64')

print(df)

   id  reg  version  previous_version  next_version
0   1   54        1              <NA>             4
1   2   54        1              <NA>             4
2   3   54        1              <NA>             4
3   4   54        2                 1             5
4   5   54        3                 4          <NA>
5   6   54        3                 4          <NA>
6   7   55        1              <NA>          <NA>

赞(0）回复(0）举报 2022-12-21

我来回答

pandas 如何基于上一组中的第一个匹配项创建新列？

2条答案

相关问题

热门标签

最新问答