numpy Pandas为具有2个标题行的 Dataframe 中的每一行转置多列

xwbd5t1u  于 2023-02-04  发布在  其他
关注(0)|答案(3)|浏览(208)

我有一个csv(见下图)与前2行作为列标题,我需要第一个标题行的标题和第二个标题行要转置的每一行。

我试过转置和使用pivot函数,但是没有用,我也试过pivot_table,但是也没有用。
预期输出为:

sg2wtvxw

sg2wtvxw1#

我只是假设您的CSV文件与我代码中内联的文件类似。

溶液

首先,读取文件并忽略标头。

import io
import pandas as pd
import numpy as np

csv_file = io.StringIO('''\
,,,,Quantity,Quantity,Quantity,Dollars,Dollars,Dollars
Brand Family,Brand,Channel,Product,2022-01-12,2022-01-13,2022-01-14,2022-01-12,2022-01-13,2022-01-14
Brand A,Brand A,Reg,A1,18,47,41,216,164,492
Brand A,Brand A,Reg,A2,9,23,20,108,276,240
Brand A,Brand A,Reg,A3,28,80,82,392,1120,1148
Brand A,Brand A,Reg,A4,,,,,,
Brand A,Brand A,Reg,A5,7,15,13,98,210,182\
''')

df = pd.read_csv(csv_file, header=None).replace(np.nan, 0)
print(df)
0        1        2        3           4           5           6           7           8           9
0             0        0        0        0    Quantity    Quantity    Quantity     Dollars     Dollars     Dollars
1  Brand Family    Brand  Channel  Product  2022-01-12  2022-01-13  2022-01-14  2022-01-12  2022-01-13  2022-01-14
2       Brand A  Brand A      Reg       A1          18          47          41         216         164         492
3       Brand A  Brand A      Reg       A2           9          23          20         108         276         240
4       Brand A  Brand A      Reg       A3          28          80          82         392        1120        1148
5       Brand A  Brand A      Reg       A4           0           0           0           0           0           0
6       Brand A  Brand A      Reg       A5           7          15          13          98         210         182

我们把它分成两部分,左边的部分

left = df.iloc[2:, :4].set_axis(df.iloc[1, :4], axis='columns')
print(left)

和右边的部分。
一个三个三个一个
现在转动右边的部分

pivoted = right.pivot(index=0, columns=[1]).T.reset_index(level=1, names=[0, 'Date'])
print(pivoted)

注意索引和左边的一致。

0        Date Dollars Quantity
2  2022-01-12     216       18
2  2022-01-13     164       47
2  2022-01-14     492       41
3  2022-01-12     108        9
3  2022-01-13     276       23
3  2022-01-14     240       20
4  2022-01-12     392       28
4  2022-01-13    1120       80
4  2022-01-14    1148       82
5  2022-01-12       0        0
5  2022-01-13       0        0
5  2022-01-14       0        0
6  2022-01-12      98        7
6  2022-01-13     210       15
6  2022-01-14     182       13

最后,通过公共索引将它们连接起来。

df = left.join(pivoted).reset_index(drop=True)
print(df)
Brand Family    Brand Channel Product        Date Dollars Quantity
0       Brand A  Brand A     Reg      A1  2022-01-12     216       18
1       Brand A  Brand A     Reg      A1  2022-01-13     164       47
2       Brand A  Brand A     Reg      A1  2022-01-14     492       41
3       Brand A  Brand A     Reg      A2  2022-01-12     108        9
4       Brand A  Brand A     Reg      A2  2022-01-13     276       23
5       Brand A  Brand A     Reg      A2  2022-01-14     240       20
6       Brand A  Brand A     Reg      A3  2022-01-12     392       28
7       Brand A  Brand A     Reg      A3  2022-01-13    1120       80
8       Brand A  Brand A     Reg      A3  2022-01-14    1148       82
9       Brand A  Brand A     Reg      A4  2022-01-12       0        0
10      Brand A  Brand A     Reg      A4  2022-01-13       0        0
11      Brand A  Brand A     Reg      A4  2022-01-14       0        0
12      Brand A  Brand A     Reg      A5  2022-01-12      98        7
13      Brand A  Brand A     Reg      A5  2022-01-13     210       15
14      Brand A  Brand A     Reg      A5  2022-01-14     182       13
pgvzfuti

pgvzfuti2#

您必须混合使用melt来扁平化 Dataframe ,然后混合使用pivot_table来根据预期输出重塑 Dataframe :

>>> (df.melt(var_name=['Variable', 'Date'], value_name='Value', ignore_index=False)
       .pivot_table(index=df.index.names+['Date'], columns='Variable', values='Value')
       .reset_index().rename_axis(columns=None))

  Brand Family Brand Channel Product        Date  Dollars  Quantity
0            A     A     Reg      A1  2022-01-12        4         1
1            A     A     Reg      A1  2022-01-13        5         2
2            A     A     Reg      A1  2022-01-14        6         3
3            A     A     Reg      A2  2022-01-12       14        11
4            A     A     Reg      A2  2022-01-13       15        12
5            A     A     Reg      A2  2022-01-14       16        13

输入:

>>> df

                                     Quantity                          Dollars                      
                                   2022-01-12 2022-01-13 2022-01-14 2022-01-12 2022-01-13 2022-01-14
Brand Family Brand Channel Product                                                                  
A            A     Reg     A1               1          2          3          4          5          6
                           A2              11         12         13         14         15         16
b5buobof

b5buobof3#

您还可以使用:

df1 = pd.read_csv('path/file.csv', header=[0, 1], index_col=[0,1,2,3])

df1.stack().reset_index()

  Brand Family Brand Channel Product        Date  Quantity  Dollars
0            A     A     Reg      A1  2022-01-12         1        4
1            A     A     Reg      A1  2022-01-13         2        5
2            A     A     Reg      A1  2022-01-14         3        6
3            A     A     Reg      A2  2022-01-12        11       14
4            A     A     Reg      A2  2022-01-13        12       15
5            A     A     Reg      A2  2022-01-14        13       16

其中df1是您的数据

相关问题