提高我的python代码的性能(我使用的是Pandas)

1hdlvixo  于 2023-01-15  发布在  Python
关注(0)|答案(1)|浏览(148)

我正在做一个数据分析的python代码。我想在一个新的列中标记EMP、RAZAO和ATRIB列中具有相同值的行,并将MONTCALC中的值添加为零。例如:
Example of datas
在此图像中,用颜色标记的线是子组,如果添加MONTCALC列的值,结果为0。
我的代码:

conciliation_df_temp = conciliation_df.copy()
doc_clear = 1

for i in conciliation_df_temp.index:
  if conciliation_df_temp.loc[i,'DOC_COMP'] == "":
    company = conciliation_df_temp.loc[i,'EMP']
    gl_account = conciliation_df_temp.loc[i,'RAZAO']
    assignment = conciliation_df_temp.loc[i,'ATRIB']
    df_temp = conciliation_df_temp.loc[(cconciliation_df_temp['EMP'] == company) & (conciliation_df_temp['RAZAO'] == gl_account) & (conciliation_df_temp['ATRIB'] == assignment)]

    if round(df_temp['MONTCALC'].sum(),2) == 0:
      conciliation_df_temp.loc[(conciliation_df_temp['EMP'] == company) & (conciliation_df_temp['RAZAO'] == gl_account) & (conciliation_df_temp['ATRIB'] == assignment),'DOC_COMP'] = doc_clear
      doc_clear += 1

很少行(10,000)的性能是好的执行不到1分钟。在1分钟内还读取了一个文本文件,文件处理和转换为 Dataframe 。但如果我把一个文本文件与超过100万行脚本不执行,我等待5小时没有返回。
我该怎么做来提高这段代码的性能?
问候!!
对不起我的英语
我试着删除数据框中的行来减小数据框的大小,以使搜索速度更快,但执行速度更慢。

yhived7q

yhived7q1#

看起来你可以检查这个群的和是否为零:

import pandas as pd

df = pd.DataFrame([
  [3000,1131500040,8701731701,-156002.08],
  [3000,1131500040,8701731701, 156002.08],
  [3000,1131500040,"EA-17012.2.22", -3990],
  [3000,1131500040,"EA-17012.2.22", 400],
  [3000,1131500040,"000100000103", -35822.86],
  [3000,1131500040,"000100000103", 35822.86],
  [3000,1131500040,"000100000103", -35822.86],
  [3000,1131500040,"000100000103", 35822.86]
], columns=['EMP','RAZAO','ATRIB','MONTCALC']
)

df['zero'] = df.groupby(['EMP','RAZAO','ATRIB'])['MONTCALC'].transform(lambda x: sum(x)==0)

print(df)

产出

EMP       RAZAO          ATRIB   MONTCALC   zero
0  3000  1131500040     8701731701 -156002.08   True
1  3000  1131500040     8701731701  156002.08   True
2  3000  1131500040  EA-17012.2.22   -3990.00  False
3  3000  1131500040  EA-17012.2.22     400.00  False
4  3000  1131500040   000100000103  -35822.86   True
5  3000  1131500040   000100000103   35822.86   True
6  3000  1131500040   000100000103  -35822.86   True
7  3000  1131500040   000100000103   35822.86   True

相关问题