pandas 如何对 Dataframe 中的重复列求和，并在至少一个值为nan时返回nan

s8vozzvw 于 2022-12-02 发布在其他

关注(0)|答案(2)|浏览(168)

I have a dataframe with duplicate columns (number not known a priori) like this example:
| | a | a | a | b | b |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | nan | 1 | 1 | 1 |
I need to be able to aggregate the columns by summing their values (by rows) and returning NaN if at least one value, in one of the columns among the duplicates, is NaN.
I have tried this code:

import numpy as np
import pandas as pd

df = pd.DataFrame([[1,1,1,1,1], [1,np.nan,1,1,1]], columns=['a','a','a','b','b'])
df = df.groupby(axis=1, level=0).sum()

The result i get is as follows, but it does not return NaN in the second row of column 'a'.
| | a | b |
| ------------ | ------------ | ------------ |
| 0 | 3 | 2 |
| 1 | 2 | 2 |
In the documentation of pandas.DataFrame.sum , there is the skipna parameter which might suit my case. But I am using the function pandas.core.groupby.GroupBy.sum which does not have this parameter, but the min_count which does what i want but the number is not known in advance and would be different for each duplicate column.

For example, a min_count=3 solves the problem for column 'a', but obviously returns NaN on the whole of column 'b'.*

The result I want to achieve is:
| | a | b |
| ------------ | ------------ | ------------ |
| 0 | 3 | 2 |
| 1 | nan | 2 |

pandas

来源：https://stackoverflow.com/questions/74646196/how-to-sum-duplicate-columns-in-dataframe-and-return-nan-if-at-least-one-value-i

2条答案

按热度按时间

juud5qan1#

一种解决方法是使用apply获取DataFrame.sum：

df.groupby(level=0, axis=1).apply(lambda x: x.sum(axis=1, skipna=False))

输出量：

a    b
0  3.0  2.0
1  NaN  2.0

赞(0）回复(0）举报 2022-12-02

h79rfbju2#

另一种可能的解决方案：

cols, ldf = df.columns.unique(), len(df)

pd.DataFrame(
    np.reshape([sum(df.loc[i, x]) for i in range(ldf) for x in cols],
               (len(cols), ldf)), 
    columns=cols)

输出量：

a    b
0  3.0  2.0
1  NaN  2.0

赞(0）回复(0）举报 2022-12-02

我来回答

pandas 如何对 Dataframe 中的重复列求和，并在至少一个值为nan时返回nan

2条答案

相关问题

热门标签

最新问答