python-3.x 合并两个 Dataframe 通过随机 Shuffle 的索引

6xfqseft 于 2023-03-31 发布在 Python

关注(0)|答案(3)|浏览(158)

有两个名为df 1和d2的 Dataframe ，它们看起来完全相同，索引为1到n（在示例1-2中）。

df1 = pd.DataFrame({
    'Fruit': ['Apple', 'Pineapple', 'Apple', 'Pineapple'],
    'Indices': [1, 1, 2, 2],
    'Value': [10, 20, 30, 40]
})

df2 = pd.DataFrame({
    'Fruit': ['Apple', 'Pineapple', 'Apple', 'Pineapple'],
    'Indices': [1, 1, 2, 2],
    'Value': [50, 60, 70, 80]
})

我有第三个 Dataframe ，它的大小正好是它的两倍，索引从1到2*n。

df3 = pd.DataFrame({
    'Fruit': ['Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple'],
    'Indices': [1, 1, 2, 2, 3, 3, 4, 4],
    'Value': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]
})

我想以一种方式填充df 3，对于每个“水果”，它以随机打乱的顺序从df 1和df 2的所有元素中填充值。
所以对于“苹果”，一个人会有10，30，50和70可用，对于“菠萝”，它会是20，40，60，80，但两者都是 Shuffle 。
结果可能如下所示：

df3 = pd.DataFrame({
    'Fruit': ['Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple'],
    'Indices': [1, 1, 2, 2, 3, 3, 4, 4],
    'Value': [10, 80, 30, 60, 70, 40, 50, 20]
})

但当然也像这样（或任何其他随机的满足水果排序条件）

df3 = pd.DataFrame({
    'Fruit': ['Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple', 'Apple', 'Pineapple'],
    'Indices': [1, 1, 2, 2, 3, 3, 4, 4],
    'Value': [30, 60, 70, 80, 10, 40, 50, 20]
})

有没有更聪明的办法？
我知道我可以通过loc选择正确的数据，并且在pandas中有一个采样方法，但是这一切是如何合并的呢？

python-3.x

来源：https://stackoverflow.com/questions/75881051/merge-two-dataframes-by-randomly-shuffling-the-indices

3条答案

按热度按时间

um6iljoc1#

您可以使用merge。使用此方法，df1和df2中的所有值都将被分配而不会重复：

# Create a temporary dataframe with an idx column
df4 = (pd.concat([df1, df2]).sample(frac=1, ignore_index=True)
         .assign(idx=lambda x: x.groupby('Fruit').cumcount()))

# Merge df3 and df4 on (Fruit, idx)
df3['Value'] = (df3.assign(idx=lambda x: x.groupby('Fruit').cumcount())
                   .merge(df4, on=['Fruit', 'idx'])['Value_y'])

输出：

>>> df3
       Fruit  Indices  Value
0      Apple        1     30
1  Pineapple        1     80
2      Apple        2     70
3  Pineapple        2     60
4      Apple        3     10
5  Pineapple        3     40
6      Apple        4     50
7  Pineapple        4     20

另一种方式

(but下面的两个方法不保证值是唯一的）

>>> (df3.merge(pd.concat([df1, df2]), on='Fruit', suffixes=(None, '_'))
        .groupby(['Fruit', 'Indices'], sort=False).sample(n=1)[df3.columns])

        Fruit  Indices  Value
0       Apple        1     70
4       Apple        2     10
11      Apple        3     50
12      Apple        4     50
18  Pineapple        1     20
21  Pineapple        2     60
27  Pineapple        3     80
31  Pineapple        4     40

如果你想保持秩序：

>>> (df3.reset_index().merge(pd.concat([df1, df2]), on='Fruit', suffixes=(None, '_'))
        .groupby('index').sample(n=1).set_index('index')
        .rename_axis(None)[df3.columns])

       Fruit  Indices  Value
0      Apple        1     30
1  Pineapple        1     80
2      Apple        2     70
3  Pineapple        2     60
4      Apple        3     10
5  Pineapple        3     40
6      Apple        4     50
7  Pineapple        4     20

赞(0）回复(0）举报 2023-03-31

flseospp2#

不知道你在找什么，但也许这对你有帮助：

for fruit in df3.Fruit.unique():
    df3.loc[df3['Fruit'] == fruit, 'Value'] = pd.concat([df1, df2])[pd.concat([df1, df2]).Fruit == fruit].sample(frac=1).Value.to_list()
df3.Value = df3.Value.astype(int)
print(df3)

结果：

Fruit  Indices  Value
0      Apple        1     30
1  Pineapple        1     60
2      Apple        2     70
3  Pineapple        2     20
4      Apple        3     50
5  Pineapple        3     40
6      Apple        4     10
7  Pineapple        4     80

赞(0）回复(0）举报 2023-03-31

hwamh0ep3#

通过使用由df1和df2列值组合而成的 fruits Map：

fruits_map = pd.concat([df1, df2]).drop('Indices', axis=1)\
    .groupby('Fruit')['Value'].agg(list).to_dict()
df3['Value'] = [np.random.choice(fruits_map[f]) for f in df3['Fruit']]

样本df3：

Fruit  Indices  Value
0      Apple        1     10
1  Pineapple        1     60
2      Apple        2     70
3  Pineapple        2     60
4      Apple        3     70
5  Pineapple        3     60
6      Apple        4     30
7  Pineapple        4     60

赞(0）回复(0）举报 2023-03-31

我来回答

python-3.x 合并两个 Dataframe 通过随机 Shuffle 的索引

3条答案

相关问题

热门标签

最新问答