python 如何防止panda Dataframe 中附加索引的重复数据

14ifxucb  于 2023-03-11  发布在  Python
关注(0)|答案(1)|浏览(144)

我想看看当我在Excel工作表中添加一个新的索引时,我如何添加新的数据。这就是我的意思:
图片1(我的代码当前输出的内容):

图片2(我希望它输出的内容):

我想将[1, 2]添加到第1行,数据类型为2,将[3, 4]添加到第2行,数据类型为2。下面是图片1的代码:

import pandas as pd

# Create a Pandas dataframe from the data.
df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
                    index=['row 1', 'row 2'],
                    columns=['col 1', 'col 2'])

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Add extra index
new_idx = pd.Index(['data type 1', 'data type 2'])
mi1 = pd.MultiIndex.from_product([df1.index, new_idx])
out1 = df1.reindex(df1.index.repeat(len(new_idx))).set_index(mi1)

# Convert the dataframe to an XlsxWriter Excel object.
out1.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.close()

有没有可能的办法做到这一点?

vjhs03f7

vjhs03f71#

slice(None)可用于选择该级别的所有行。由于要更改level=1中索引为'data type 2'的值,请尝试以下操作:

# write new values to data type 2 indices
out1.loc[(slice(None), 'data type 2'), :] = [[1, 2], [3, 4]]

完整的代码如下所示:

# Create a Pandas dataframe from the data.
df1 = pd.DataFrame([['a', 'b'], ['c', 'd']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])

# # Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Add extra index
new_idx = pd.Index(['data type 1', 'data type 2'])
mi1 = pd.MultiIndex.from_product([df1.index, new_idx])
out1 = df1.reindex(df1.index.repeat(len(new_idx))).set_index(mi1)

# write new values to data type 2 indices
out1.loc[(slice(None), 'data type 2'), :] = [[1, 2], [3, 4]]    # <-- add this line

# # Convert the dataframe to an XlsxWriter Excel object.
out1.to_excel(writer, sheet_name='Sheet1')

writer.close()

然后pd.read_excel('pandas_simple.xlsx', index_col=[0,1])生成以下帧:

另一种方法是改变构造out1的方式;使用与df1相同的“元数据”构建另一个 Dataframe ,并将两者连接,而不是重新索引df1

out1 = (
    pd.concat({
        'data type 1': df1, 
        'data type 2': pd.DataFrame([[1, 2], [3, 4]], columns=df1.columns, index=df1.index)
    }).swaplevel().sort_index()
)

相关问题