如何在Pandas中进行原地矢量化?

mbjcgjjk  于 2023-01-24  发布在  其他
关注(0)|答案(1)|浏览(123)

我如何对一个panda.DataFrame或panda.Seriesin place应用矢量化操作?我只找到了创建和返回副本的方法。
我主要关心的是减少资源的使用,但如果能知道它是否可能,即使它不是特别有效,那也是很好的。
按照目前的情况,您可以执行以下操作:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
#  df
#    a  b
# 0  1  4
# 1  2  5
# 2  3  6

df['c'] = df['a'] + df['b']
#  df
#    a  b  c
# 0  1  4  5
# 1  2  5  7
# 2  3  6  9

# Or like this
df['c'] = np.log(df['a'])
#  df
#    a  b         c
# 0  1  4  0.000000
# 1  2  5  0.693147
# 2  3  6  1.098612

# Or these methods:
# df['c'] = df['a'].apply(np.log)
# df['c'] = np.vectorize(np.log)(df['a'])

但是,我想做一些类似于R中的data.table所能做的事情。

df[c = np.log(a)]
#  df
#    a  b         c
# 0  1  4  0.000000
# 1  2  5  0.693147
# 2  3  6  1.098612

# or even
df['a'].apply(np.log, inplace=True) # which doesn't exist
# so that column 'a' were transformed in place
# df[['a','b']]
#           a  b
# 0  0.000000  4
# 1  0.693147  5
# 2  1.098612  6
py49o6xq

py49o6xq1#

您可以尝试这样做...这仍然会创建一个中间缓冲区,但最终的 Dataframe 将是相同的。

df['a'] = df['a'] + df['b'] # Does not create new column df['c']

df['a'] = np.log(df['a'])

相关问题