python-3.x 基于多个条件对所有pd.dataframe行进行内部乘法运算

pgccezyw  于 2023-01-03  发布在  Python
关注(0)|答案(1)|浏览(88)
    • 方案:**我正尝试根据一组条件在 Dataframe 中创建多行。
    • Dataframe 示例**(这是使用Pandas从xlsx文件导入):
Model    Scenario    Region    Variable    Unit    Year1    Year2    ...    Year50
  1        Base        1         GDP      M USD     10       15               20  
  1        Base        2         GDP      M USD     30       35               50  
  1        Base        3         GDP      M USD     20       75               80  
  1        Stress 1    1         GDP      % diff    0.48    0.11             0.31  
  1        Stress 1    2         GDP      % diff    0.12    0.33             0.89  
  1        Stress 1    3         GDP      % diff    0.76    0.54             0.08  
  1        Stress 2    1         GDP      % diff    0.37    0.94             0.13  
  1        Stress 2    2         GDP      % diff    0.73    0.76             0.35  
  1        Stress 2    3         GDP      % diff    0.15    0.45             0.37  
  1        Stress 3    1         GDP      % diff    0.49    0.14             0.37  
  1        Stress 3    2         GDP      % diff    0.14    0.73             0.94  
  1        Stress 3    3         GDP      % diff    0.96    0.26             0.85
    • 观察结果:**每个压力情景都是相对于基本情景的%变化(对于相同的区域和变量)。这意味着它们是基本值 * 1+压力值。
    • 观察结果2:**原始 Dataframe 具有更多模型、场景、区域和变量,但它们始终相同(所有模型具有相同的场景集,所有场景具有相同的区域集,以此类推)。
    • 目标:**是使每行的值与基线的单位相同。为此,我需要执行上面提到的乘法。

公式如下:

Model    Scenario    ...    Year1          Year2    ...     Year50
  1      Stress 1        10*(1+0.48)    15*(1+0.11)        20*(1+0.31)

输出结果为:

Model    Scenario    ...    Year1          Year2    ...     Year50
  1      Stress 1           14.8           16.65             26.2
    • 我已经尝试过的:**我正在尝试使用df. loc函数来查找匹配的值,并使用它们来进行计算:
test_df.loc[((test_df['Model'] == '1') & (test_df['Scenario'] == 'Stress1') & (test_df['Region'] == "1") & (test_df['Variable'] == 'GDP'))] = test_df.loc[((test_df['Model'] == '1') & (test_df['Scenario'] == 'Base') & (test_df['Region'] == "1") & (test_df['Variable'] == 'GDP'))] * (1 + test_df.loc[((test_df['Model'] == '1') & (test_df['Scenario'] == 'Stress1') & (test_df['Region'] == "1") & (test_df['Variable'] == 'GDP'))])
    • 观察结果3:**我认为该方法存在两个问题:首先,我不能正确地控制"年"列的值;其次,我不确定如何对整个 Dataframe 立即执行此操作,而不必为模型/场景/区域/变量的每个可能组合创建一行。
    • 问题:**是否有执行此操作的方法?如果有,最佳方法是什么?
5f0d552i

5f0d552i1#

第一个过滤器Base Dataframe -在样本数据中可能仅使用基本条件test_df['Scenario'] == 'Base',并转换用于正确对齐另一个 Dataframe 的列-此处'Model','Region','Variable'Scenario, Unit不同,因此省略,并且也是过滤列列表:

years = [Year1,Year2,Year50]
df1 = (test_df[(test_df['Scenario'] == 'Base')]
             .set_index(['Model','Region','Variable'])[years])
print (df1)
                       Year1  Year2  Year50
Model Region Variable                      
1     1      GDP        10.0   15.0    20.0
      2      GDP        30.0   35.0    50.0
      3      GDP        20.0   75.0    80.0

类似的方法用于df2

df2 = (test_df[(test_df['Unit'] == '% diff')]
             .set_index(['Model','Scenario','Region','Variable','Unit'])[years])
print (df2)
                                       Year1  Year2  Year50
Model Scenario Region Variable Unit                        
1     Stress 1 1      GDP      % diff   0.48   0.11    0.31
               2      GDP      % diff   0.12   0.33    0.89
               3      GDP      % diff   0.76   0.54    0.08
      Stress 2 1      GDP      % diff   0.37   0.94    0.13
               2      GDP      % diff   0.73   0.76    0.35
               3      GDP      % diff   0.15   0.45    0.37
      Stress 3 1      GDP      % diff   0.49   0.14    0.37
               2      GDP      % diff   0.14   0.73    0.94
               3      GDP      % diff   0.96   0.26    0.85

由于df1.index中的某些级别与df2.index匹配,并且df1中的唯一索引在1加上df1乘以df2之后是可能的:

df = df2.add(1).mul(df1).reset_index()
print (df)
   Model  Region Variable  Scenario    Unit  Year1   Year2  Year50
0      1       1      GDP  Stress 1  % diff   14.8   16.65    26.2
1      1       1      GDP  Stress 2  % diff   13.7   29.10    22.6
2      1       1      GDP  Stress 3  % diff   14.9   17.10    27.4
3      1       2      GDP  Stress 1  % diff   33.6   46.55    94.5
4      1       2      GDP  Stress 2  % diff   51.9   61.60    67.5
5      1       2      GDP  Stress 3  % diff   34.2   60.55    97.0
6      1       3      GDP  Stress 1  % diff   35.2  115.50    86.4
7      1       3      GDP  Stress 2  % diff   23.0  108.75   109.6
8      1       3      GDP  Stress 3  % diff   39.2   94.50   148.0

相关问题