如何在Python中将具有多级列的Pandas DataFrame转换为单级列DataFrame?

f1tvaqid  于 2023-09-29  发布在  Python
关注(0)|答案(1)|浏览(145)

我有一个带有多级列的Pandas DataFrame,如下所示:
| | | | | 23- ||||2月23||||||||||||
| --|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
| 市场|产品|市|领土|值|价值观市场份额|价值观成长|价值观地理分享|单位|单位市场份额|单位增长|单位地理份额|值|价值观市场份额|价值观成长|价值观地理分享|单位|单位市场份额|单位增长|单位地理份额|
x1c 0d1x我想创建一个Python函数,将此DataFrame转换为以下格式:
| 市场|产品|市|领土|值|价值观市场份额|价值观成长|价值观地理分享|单位|单位市场份额|单位增长|单位地理份额|日期|
| --|--|--|--|--|--|--|--|--|--|--|--|--|
| | | | | | | | | | | | | 23-Jan |
| | | | | | | | | | | | | 23-Feb |


如何使用Python和Pandas实现此转换?

umuewwlo

umuewwlo1#

MultiIndex构造函数不可用时,很难提供帮助。您可以使用stack和一些索引方法来重塑 Dataframe :

>>> (df.set_index(df.columns[:4].tolist())  # Market, Product, City, Territory
       .rename_axis(index=df.columns[:4].droplevel(0),  # Flat them
                    columns=['Date', None])  # Define column names
       .stack('Date', sort=False).reset_index())  # Reshape your dataframe

   Market  Product  City  Territory    Date  VALUES  Values MARKET SHARE  VALUES GROWTH  VALUES GEO. SHARE  UNITS  UNITS MARKET SHARE  UNITS GROWTH  UNITS GEO. SHARE
0       0        0     0          0  23-Jan       1                    1              1                  1      1                   1             1                 1
1       0        0     0          0  23-Feb       2                    2              2                  2      2                   2             2                 2

最小工作示例:

data = {('', 'Market'): {0: 0},
 ('', 'Product'): {0: 0},
 ('', 'City'): {0: 0},
 ('', 'Territory'): {0: 0},
 ('23-Jan', 'VALUES'): {0: 1},
 ('23-Jan', 'Values MARKET SHARE'): {0: 1},
 ('23-Jan', 'VALUES GROWTH'): {0: 1},
 ('23-Jan', 'VALUES GEO. SHARE'): {0: 1},
 ('23-Jan', 'UNITS'): {0: 1},
 ('23-Jan', 'UNITS MARKET SHARE'): {0: 1},
 ('23-Jan', 'UNITS GROWTH'): {0: 1},
 ('23-Jan', 'UNITS GEO. SHARE'): {0: 1},
 ('23-Feb', 'VALUES'): {0: 2},
 ('23-Feb', 'Values MARKET SHARE'): {0: 2},
 ('23-Feb', 'VALUES GROWTH'): {0: 2},
 ('23-Feb', 'VALUES GEO. SHARE'): {0: 2},
 ('23-Feb', 'UNITS'): {0: 2},
 ('23-Feb', 'UNITS MARKET SHARE'): {0: 2},
 ('23-Feb', 'UNITS GROWTH'): {0: 2},
 ('23-Feb', 'UNITS GEO. SHARE'): {0: 2}}
df = pd.DataFrame(data)
print(df)

# Output
                                23-Jan                      ...        23-Feb                                                                         
  Market Product City Territory VALUES Values MARKET SHARE  ... VALUES GROWTH VALUES GEO. SHARE UNITS UNITS MARKET SHARE UNITS GROWTH UNITS GEO. SHARE
0      0       0    0         0      1                   1  ...             2                 2     2                  2            2                2

[1 rows x 20 columns]

相关问题