pandas 每月数据百分比

gpfsuwkq  于 2023-09-29  发布在  其他
关注(0)|答案(4)|浏览(95)

我有一些每日的数量数据,有一个类型列。

Day       Type    Volume
0        20230101  0       -23336.289
1        20230101  1       2930009.848
2        20230101  2       -2906673.559
3        20230102  0       7377.021
4        20230102  1       2892521.704
5        20230102  2       -2899898.724

我想创建一个列,可能是mon_pct,它使用类型为0的每天的交易量,然后将它们除以类型为1的每月交易量总和。例如,(-23336.289 /(Jan Sum Type 1))* 100。只有类型为0的行才会得到结果值。
我怎么能做到这一点?

olmpazwi

olmpazwi1#

使用groupby.transform

m = df['Type'].eq(0)

df.loc[m, 'mon_pct'] = (df['Volume']
 .where(df['Type'].eq(1))
 .groupby(pd.to_datetime(df['Day']).dt .to_period('M'))
 .transform('sum')[m]
 .rdiv(df['Volume']).mul(100)
)

输出量:

Day  Type       Volume   mon_pct
0  20230101     0   -23336.289 -0.400793
1  20230101     1  2930009.848       NaN
2  20230101     2 -2906673.559       NaN
3  20230102     0     7377.021  0.126698
4  20230102     1  2892521.704       NaN
5  20230102     2 -2899898.724       NaN
0yycz8jy

0yycz8jy2#

计算类型1整体的体积总和:

type_1_sum = df["Volume"][df["Type"] == 1].sum()

创建一个新的DF,其中mon_pct列包含类型0条目

df_type_0 = df[df['Type'] == 0].copy()
df_type_0["mon_pct"] = (df["Volume"]/type_1_sum)*100

合并新的df到旧的

df = df.merge(df_type_0, how='left')

输出:

velaa5lx

velaa5lx3#

另一种可能的解决方案,基于pandas.DataFrame.resample

(df.assign(
    mon_pct = df['Volume'].where(df['Type'].eq(0))
    .div(
        df.assign(
            Day = pd.to_datetime(df['Day'], format='%Y%m%d'))
        .set_index('Day').loc[lambda x: x['Type'].eq(1)]
        .resample('M')['Volume'].sum().iloc[0])
    .mul(100))
)

输出量:

Day  Type       Volume   mon_pct
0  20230101     0   -23336.289 -0.400793
1  20230101     1  2930009.848       NaN
2  20230101     2 -2906673.559       NaN
3  20230102     0     7377.021  0.126698
4  20230102     1  2892521.704       NaN
5  20230102     2 -2899898.724       NaN
nfeuvbwi

nfeuvbwi4#

import pandas as pd

# Sample data
data = {
    'Day': [20230101, 20230101, 20230101, 20230102, 20230102, 20230102],
    'Type': [0, 1, 2, 0, 1, 2],
    'Volume': [-23336.289, 2930009.848, -2906673.559, 7377.021, 2892521.704, -2899898.724]
}

df = pd.DataFrame(data)

# Convert 'Day' to datetime
df['Day'] = pd.to_datetime(df['Day'], format='%Y%m%d')

# Calculate the monthly sum of volumes for 'Type 1'
monthly_sum_type1 = df[df['Type'] == 1].groupby(df['Day'].dt.month)['Volume'].sum()

# Create a function to calculate 'mon_pct'
def calculate_mon_pct(row):
    if row['Type'] == 0:
        month = row['Day'].month
        if month in monthly_sum_type1:
            return (row['Volume'] / monthly_sum_type1[month]) * 100
    return None

# Apply the function to create the 'mon_pct' column
df['mon_pct'] = df.apply(calculate_mon_pct, axis=1)

# Print the resulting DataFrame
print(df)

!请注意,您应该使用实际数据集替换示例数据。

相关问题