pandas 将一行除以另一行以创建新行并转置 Dataframe

8gsdolmq  于 2023-02-14  发布在  其他
关注(0)|答案(3)|浏览(206)

我有一个 Dataframe ,看起来像这样

data = [['Location 1', 'Oranges', 9, 12, 5, 10, 7, 12], ['Location 1', 'Apples', 2, 6, 4, 3, 7, 2], ['Location 1', 'Total', 11, 18, 9, 13, 14, 14],
        ['Location 2', 'Oranges', 11, 8, 14, 8, 10, 9], ['Location 2', 'Apples', 5, 4, 6, 2, 9, 9], ['Location 2', 'Total', 16, 12, 20, 10, 19, 18]]

df = pd.DataFrame(data, columns=['Location', 'Fruit', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
)

    Location    Fruit   Jan Feb Mar Apr May Jun
0   Location 1  Oranges 9   12  5   10  7   12
1   Location 1  Apples  2   6   4   3   7   2
2   Location 1  Total   11  18  9   13  14  14
3   Location 2  Oranges 11  8   14  8   10  9
4   Location 2  Apples  5   4   6   2   9   9
5   Location 2  Total   16  12  20  10  19  18

我希望按位置分组,获取苹果百分比(苹果/总数),并将 Dataframe 转置为最终如下所示

Jan                       Feb                       Mar
Location    # of Apples    % Fruit    # of Apples    % Fruit    # of Apples    % Fruit
Location 1       2          18.2%          6          33.3%          4          44.4%
Location 2       5          31.3%          4          33.3%          6          20.0%

我尝试过使用此方法,但似乎有些乏味,因为我的完整数据集包含两个以上的位置

df.iloc[3, 2:4] = df.iloc[1, 2:4] / df.iloc[2, 2:4]

谢谢大家!

xqk2d5yq

xqk2d5yq1#

溶液

# Set the index to location and fruit
s = df.set_index(['Location', 'Fruit'])

# Select the rows corresponding to Apples and Total
apples, total = s.xs('Apples', level=1), s.xs('Total', level=1)

# Divide apples by total to calculate pct then concat
pd.concat([apples / total * 100, apples], keys=['%_fruit', '#_of_apples']).unstack(0)

结果

Jan                    Feb                    Mar                    Apr                    May                    Jun            
              %_fruit #_of_apples    %_fruit #_of_apples    %_fruit #_of_apples    %_fruit #_of_apples    %_fruit #_of_apples    %_fruit #_of_apples
Location                                                                                                                                            
Location 1  18.181818         2.0  33.333333         6.0  44.444444         4.0  23.076923         3.0  50.000000         7.0  14.285714         2.0
Location 2  31.250000         5.0  33.333333         4.0  30.000000         6.0  20.000000         2.0  47.368421         9.0  50.000000         9.0
798qvoo8

798qvoo82#

要达到预期的结果,可以使用pivot方法来重新调整 Dataframe ,然后将“Apples”行的值除以“Total”行的值。

df_pivot = df[df['Fruit'] != 'Total'].pivot(index='Location', columns='Fruit', values=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])
df_pivot.columns = df_pivot.columns.map('_'.join)
df_pivot['Jan_% Fruit'] = df_pivot['Jan_Apples'] / df_pivot['Jan_Total']
df_pivot['Feb_% Fruit'] = df_pivot['Feb_Apples'] / df_pivot['Feb_Total']
df_pivot['Mar_% Fruit'] = df_pivot['Mar_Apples'] / df_pivot['Mar_Total']
df_pivot = df_pivot.loc[:, ['Jan_Apples', 'Jan_% Fruit', 'Feb_Apples', 'Feb_% Fruit', 'Mar_Apples', 'Mar_% Fruit']]
df_pivot = df_pivot.reset_index()
df_pivot.columns = ['Location', '# of Apples', '% Fruit', '# of Apples', '% Fruit', '# of Apples', '% Fruit']
df_pivot['% Fruit'] = df_pivot['% Fruit'].apply(lambda x: '{:.1%}'.format(x))

下面是使用您提供的数据进行输入和输出的示例:
输入:

Location    Fruit  Jan  Feb  Mar  Apr  May  Jun
0  Location 1  Oranges   9   12    5   10    7   12
1  Location 1  Apples    2    6    4    3    7    2
2  Location 1   Total   11   18    9   13   14   14
3  Location 2  Oranges  11    8   14    8   10    9
4  Location 2  Apples    5    4    6    2    9    9
5  Location 2   Total   16   12   20   10   19   18

输出:

Location  # of Apples % Fruit  # of Apples % Fruit  # of Apples % Fruit
0  Location 1           2    18.2%           6    33.3%           4    44.4%
1  Location 2           5    31.3%           4    33.3%           6    20.0%
svdrlsy4

svdrlsy43#

我现在才看到@Shubham的答案,和我想出来的答案很像,我还是会贴这个答案,因为它略有不同:通过将索引设置为['Fruit', 'Location'],可以避免使用xs(),而使用简单的.loc[],但实际上这是吹毛求疵,两者非常相似。

z = df.set_index(['Fruit', 'Location'])

out = pd.concat([
    z.loc['Apples'],
    100 * z.loc['Apples'] / z.loc['Total']
], axis=1, keys=['# Apples', '% Fruit']).swaplevel(axis=1).reindex(z.columns, axis=1, level=0)

>>> out.round(1)
           Jan              Feb              Mar              Apr              May              Jun             
           # Apples % Fruit # Apples % Fruit # Apples % Fruit # Apples % Fruit # Apples % Fruit # Apples % Fruit
Location                                                                                                        
Location 1  2        18.2    6        33.3    4        44.4    3        23.1    7        50.0    2        14.3  
Location 2  5        31.2    4        33.3    6        30.0    2        20.0    9        47.4    9        50.0

相关问题