UpdateOrAdd()更改Pandas Dataframe

m3eecexj  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(139)

嗨,我想知道在Pandas DataFrame中添加或更新数据的最快、最简单的方法是什么

import pandas as pd

# Original DataFrame
pd.DataFrame([
        {'A':'a1','B':'b1','C':'c1'},
        {'A':'a3','B':'b2','C':'c2'},
        {'A':'a3','B':'b3','C':'c3'},
    ])
    
Original DataFrame :
    A   B   C
0  a1  b1  c1
1  a3  b2  c2
2  a3  b3  c3

# A List of changes
changes = [
    {'id':0, 'A':'aNEW','C':'cNEW'},
    {'id':2, 'B':'bNEW'},
    {'id':3, 'A':'aNEW','C':'cNEW'}},
]


# HOW TO ?
df.UpdateOrAdd(changes)

Resulting DataFrame :
      A     B     C
0  aNEW    b1  cNEW
1    a3    b2    c2
2    a3  bNEW    c3
3  aNEW  None  cNEW

使用更改列表添加或更新PandasDataFrame

wecizke3

wecizke31#

您可以从字典中创建一个DataFrame,然后将索引与reindexcombine_first对齐:

df2 = pd.DataFrame(changes).set_index('id')

out = (df2.reindex(df.index.union(df2.index))
          .combine_first(df)
      )

输出量:

A     B     C
0  aNEW    b1  cNEW
1    a3    b2    c2
2    a3  bNEW    c3
3  aNEW   NaN  cNEW
作为方法

如果确实需要,可以使用monkey patching将其添加为DataFrame方法:

def AddOrUpdate(self, other):
    if not isinstance(other, pd.DataFrame):
        other = pd.DataFrame(other)
    other = other.set_index('id')
    return (other.reindex(self.index.union(other.index))
                 .combine_first(df)
            )

pd.DataFrame.AddOrUpdate = AddOrUpdate

out = df.AddOrUpdate(changes)
vltsax25

vltsax252#

如果DataFrame索引是从0开始的整数,并且具有连续值,则可以使用.loc并添加一个新行,根据行计数在下一行创建新索引:

df.loc[df.shape[0]] = ['aNEW',  None,  'cNEW']

#df
A     B     C
0    a1    b1    c1
1    a3    b2    c2
2    a3    b3    c3
3  aNEW  None  cNEW

你也可以传递字典,如果你不在乎它是None还是NaN,你就不需要包含None的键/值对:

df.loc[df.shape[0]] = {'A': 'aNew+', 'C': 'cNew+'}

#df
A    B      C
0     a1   b1     c1
1     a3   b2     c2
2     a3   b3     c3
3  aNew+  NaN  cNew+

相关问题