pandas 返回不带特定列前导0的 Dataframe 的函数

d8tt03nd 于 2023-03-06 发布在其他

关注(0)|答案(3)|浏览(104)

我有以下 Dataframe ：

df=pd.DataFrame({
        'n' : [0,1,2,3, 0,1,2, 0,1,2],
    'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
    'col2' : [0, 0, 0, 0, 3.3, 0, 4, 1.94, 0, 6.17]
    })

其形式为：

n   col1    col2
0   0   A   0.00
1   1   A   0.00
2   2   A   0.00
3   3   B   0.00
4   0   B   3.30
5   1   B   0.00
6   2   B   4.00
7   0   C   1.94
8   1   C   0.00
9   2   C   6.17

我想要一个函数，将该 Dataframe 作为参数，并将返回一个新的 Dataframe ，其中没有列"col2"中值为0的前几行

- 我的密码**

def remove_lead_zeros(df):
   new_df = df[df['col2'] != 0]
   return new_df

我的函数删除了所有值为0.0的行，而我只想删除所有第一行，

- 目标**

得到以下 Dataframe 作为结果：

n   col1    col2
0   0   B     3.30
1   1   B     0.00
2   2   B     4.00
3   0   C     1.94
4   1   C     0.00
5   2   C     6.17

任何帮助从您的身边将高度赞赏（向上投票所有答案），谢谢!

pandas

来源：https://stackoverflow.com/questions/75603641/function-that-retuns-a-dataframe-without-leading-0s-of-a-specific-column

3条答案

按热度按时间

slwdgvem1#

对非零col2值和布尔索引的布尔序列使用groupby.cummax：

out = df[df['col2'].ne(0).groupby(df['col1']).cummax()]

输出：

n col1  col2
4  0    B  3.30
5  1    B  0.00
6  2    B  4.00
7  0    C  1.94
8  1    C  0.00
9  2    C  6.17

中间人理解逻辑：

n col1  col2  ne(0)  groupby.cummax
0  0    A  0.00  False           False
1  1    A  0.00  False           False
2  2    A  0.00  False           False
3  3    B  0.00  False           False
4  0    B  3.30   True            True
5  1    B  0.00  False            True
6  2    B  4.00   True            True
7  0    C  1.94   True            True
8  1    C  0.00  False            True
9  2    C  6.17   True            True

赞(0）回复(0）举报 2023-03-06

b1zrtrql2#

您可以使用cumsum：

>>> df[df.groupby('col1')['col2'].cumsum().ne(0)]
   n col1  col2
4  0    B  3.30
5  1    B  0.00
6  2    B  4.00
7  0    C  1.94
8  1    C  0.00
9  2    C  6.17

当和为0时，表示有前导零。

>>> pd.concat([df, df.groupby('col1')['col2'].cumsum()], axis=1)
   n col1  col2  col2
0  0    A  0.00  0.00  # remove
1  1    A  0.00  0.00  # remove
2  2    A  0.00  0.00  # remove
3  3    B  0.00  0.00  # remove
4  0    B  3.30  3.30  # keep
5  1    B  0.00  3.30  # keep
6  2    B  4.00  7.30  # keep
7  0    C  1.94  1.94  # keep
8  1    C  0.00  1.94  # keep
9  2    C  6.17  8.11  # keep

赞(0）回复(0）举报 2023-03-06

fxnxkyjh3#

首先，获取一个布尔数组，其中col2不为0，然后使用cumulative max，以获取可应用于 Dataframe 的掩码。

result = df[(df["col2"] != 0).cummax()].reset_index(drop=True)

其中result看起来像

n   col1 col2
0   0   B    3.30
1   1   B    0.00
2   2   B    4.00
3   0   C    1.94
4   1   C    0.00
5   2   C    6.17

赞(0）回复(0）举报 2023-03-06

我来回答

pandas 返回不带特定列前导0的 Dataframe 的函数

3条答案

相关问题

热门标签

最新问答