用于在每组数据框之间进行计算的Python函数

5fjcxozz  于 2022-11-19  发布在  Python
关注(0)|答案(1)|浏览(111)

我需要有一个函数来执行下面提到的操作;数据集为:

预期输出是'Difference'列中值,其余为输入列
请注意,在每个组中,我们首先需要确定最大“Closing_time”,相应的金额将是该期间的最大值。然后从先前周期的最大检测值中减去每一行值,并且结果将是该单元的差。此外,如果记录没有上一个期间,则该期间所有记录的最大值将为NA,差异计算将为NA。
在每组(成本中心、账户、年、月)内添加积分- Closing_time值如下所示(D-0 00 CST为最小值,D-0 18 CST为最大值,类似地,在D-0、D+1、D+3等范围内- D+3为最大值)
我试图首先找到是否存在每个组的先前值,然后找到每个期间内的最大时间,然后将金额值与之对应。进一步使用最大值,试图从最大值中减去记录金额,但不知道如何实现,请帮助。

i7uq4tfw

i7uq4tfw1#

我把这个问题分成3部分- a)首先找到每个cost_center和account的前一年和前一个月b)在cost_center,account,year和month的每一组中找到最大Closing_time。然后选择相应的Amount值作为amount。c)使用来自b的amount,减去当前的amount得到difference。

def prevPeriod(df):
period =[]
for i in range(df.shape[0]):
    if df['Month'][i]==1:
        val_year = df['Year'][i]-1
        val_month = 12
        new_val =(val_year,val_month)
        period.append(new_val)
    else:
        val_year = df['Year'][i]
        val_month = df['Month'][i]-1
        new_val =(val_year,val_month)
        period.append(new_val)
print(period)
df['Previous_period'] = period
return df

def max_closing_time(group_list):
group_list = [item.replace('CST','') for item in group_list]
group_list = [item.replace('D','') for item in group_list]
group_list = [item.split()[:len(item)] for item in group_list]
l3 =[]
l4 =[]
for item in group_list:
    l3.append(item[0])
    l4.append(item[1])
l3 =[int(item) for item in l3]
l4 = [int(item) for item in l4]
max_datevalue = max(l3)
max_datevalue_index = l3.index(max(l3))
max_time_value = max(l4[max_datevalue_index:])
maximum_period = 'D+'+str(max_datevalue)+' '+str(max_time_value)+' '+'CST'
return maximum_period

def calculate_difference(df):
diff =[]
for i in range(df.shape[0]):
    prev_year =df['Previous_period'][i][0]
    print('prev_year is',prev_year)
    prev_month = df['Previous_period'][i][1]
    print('prev_month is', prev_month)
    max_closing_time = df[(df['Year']==prev_year)& (df['Month']==prev_month)]['Max_Closing_time']
    print('max_closing_time is', max_closing_time)
    #max_amount_consider = df[(df['Year']==prev_year)& (df['Month']==prev_month) &(df['Max_Closing_time']==max_closing_time)]['Amount']
    if bool(max_closing_time.empty):
        found_diff = np.nan
        diff.append(found_diff)
    else:
        max_closing_time_value = list(df[(df['Year']==prev_year)& (df['Month']==prev_month)]['Max_Closing_time'])[0]
        max_amount_consider = df[(df['Cost_centre']==df['Cost_centre'][i])&(df['Account']==df['Account'][i])&(df['Year']==prev_year) & (df['Month']==prev_month) &(df['Closing_time']==str(max_closing_time_value))]['Amount']
        print('max_amount_consider is',max_amount_consider)
        found_diff = int(max_amount_consider) - df['Amount'][i]
        diff.append(found_diff)
df['Variance'] = diff
return df

def calculate_variance(df):
'''
Input data frame is coming as query used above to fetch data
'''
try:
    df = prevPeriod(df)
except:
    print('Error occured in prevPeriod function')
# prerequisite for max_time_period
df2 = pd.DataFrame(df.groupby(['Cost_centre','Account','Year','Month'])['Closing_time'].apply(max_closing_time).reset_index())
df = pd.merge(df,df2, on =['Cost_centre','Account','Year','Month'])
# final calculation
try:
    final_result = calculate_difference(df)
except:
    print('Error in calculate_difference')
return final_result

相关问题