I'm looking to replace values in a Dataframe with the values in a second Dataframe by matching the values in the first Dataframe with the columns from the second Dataframe.
Example:
import numpy as np
import pandas as pd
dt_index = pd.to_datetime(['2003-05-01', '2003-05-02', '2003-05-03', '2003-05-04'])
df = pd.DataFrame({'A':[1,1,3,12], 'B':[12,1,3,3], 'C':[3,12,12,1]}, index = dt_index)
df2 = pd.DataFrame({1:[1.4,4.2,1.3,5.6], 12:[2.3,7.3,9.5,0.4], 3:[8.8,0.1,8.7,2.4], 4:[9.6,9.8,5.5,1.8]}, index = dt_index)
df =
A B C
2003-05-01 1 12 3
2003-05-02 1 1 12
2003-05-03 3 3 12
2003-05-04 12 3 1
df2 =
1 12 3 4
2003-05-01 1.4 2.3 8.8 9.6
2003-05-02 4.2 7.3 0.1 9.8
2003-05-03 1.3 9.5 8.7 5.5
2003-05-04 5.6 0.4 2.4 1.8
Expected output:
expect = pd.DataFrame({'A':[1.4,4.2,8.7,0.4], 'B':[2.3,4.2,8.7,2.4], 'C':[8.8,7.3,9.5,5.6]}, index = dt_index)
expect =
A B C
2003-05-01 1.4 2.3 8.8
2003-05-02 4.2 4.2 7.3
2003-05-03 8.7 8.7 9.5
2003-05-04 0.4 2.4 5.6
Attempt:
X = df.copy()
for i in np.unique(df):
X.mask(df == i, df2[i], axis=0, inplace=True)
My attempt seems to work but I'm not sure if it has any pitfalls and how it would scale as the sizes of the Dataframe increase.
Are there better or faster solutions?
EDIT:
After cottontail's helpful answer, I realised I've made an oversimplification in my example. The values in df and columns of df and df2 cannot be assumed to be sequential.
I've now modified the example to reflect that.
1条答案
按热度按时间ht4b089n1#
一种方法是使用
stack()
将df2
重新整形为Series,并且使用df
中的值将reindex()
重新整形为Series;使用unstack()
将其重新整形回原始形状。另一种方法是迭代地创建形状像
df2
的虚拟 Dataframe ,将其乘以df2
,将其缩减为Series(使用sum()
),并将其分配给形状像df
的空 Dataframe 。