pandas 采用面板数据df.div(level=？).索引级别

样本数据

arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
                   [11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
                   [22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)

个字符
感谢Shubham Sharma回答multiindex column and rows match if the column and row names are similar and exclude the values from adding in those cell python pandas，我能够得到标记为总的列。其简单地是排除匹配的第一级行索引和第零级列索引的水平求和。

#to match the same countries from index and column then exclude the matching cell from addition
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)

resulting df

Country  country1    country2  TOTAL 
               Sector    A   B       A   B    EX    GR
Year    country sector                      
2000    country1    A   24  20      30  20   50.0   94
                    B   16  14      10  25   35.0   65
        country2    A   28  22      6   28   50.0   84
                    B   11  10      10  4    21.0   35
2001    country1    A   6   7       12  16   28.0   41
                    B   19  24      6   9    15.0   58
        country2    A   22  9       10  15   31.0   56
                    B   9   1       4   2    10.0   16

的一种或多种
现在我想继续，沿着列（轴=1）使用元素划分。
第一个月
这给了我ValueError：无法强制到系列，长度必须为4：给定8，这是因为它没有考虑年份指数。但如果只有一年，这个结果会很好地给我预期的结果。但我想每年都做考虑到水平。我希望最终的df看起来像什么

Country      country1       country2         TOTAL   
               Sector      A        B        A      B         EX    GR
Year    country sector                      
2000    country1    A   24/94   20/65      30/84    20/35    50.0   94
                    B   16/94   14/65      10/84    25/35    35.0   65
        country2    A   28/94   22/65       6/84    28/35    50.0   84
                    B   11/94   10/65      10/84    4/35     21.0   35
2001    country1    A   6/41    7/58       12/56    16/16    28.0   41
                    B   19/41   24/58       6/56    9/16     15.0   58
        country2    A   22/41   9/58       10/56    15/16    31.0   56
                    B   9/41    1/58        4/56    2/16    10.0    16```

型
我看过df.div的文档，并试图在元素划分的同时包含1级按年分组，但是出现了同样的错误

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1, level=1)

型
ValueError：无法强制到系列，长度必须为4：给8也许我做的整件事都不对，所以我接受任何轻松的工作周围的建议

IIUC，您可以尝试：

def fn(x):
    x.iloc[:, :-2] = x.iloc[:, :-2].div(x[('TOTAL', 'GR')].values, axis=1)
    return x

x = df.groupby(level='Year', group_keys=False).apply(fn)
print(x)

字符串
图纸：

country               country1            country2           TOTAL     
sector                   agri1     cons2     agri1     cons2    EX   GR
country  Year sector                                                   
country1 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675
country2 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675

型
首字母df来自您的问题：

country              country1       country2       TOTAL     
sector                  agri1 cons2    agri1 cons2    EX   GR
country  Year sector                                         
country1 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
country2 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675

型

pandas 采用面板数据df.div(level=？).索引级别

1条答案

相关问题

热门标签

最新问答