在csv.reader中查找条件字符串附加值

yeotifhr  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(165)

我有一个供应商应付款帐龄报表,我正在尝试将其自动化,该报表以.csv文件的形式从财务系统导出。在该报表中,列出了一行“未开票的本位币金额”,后面是列表中每个供应商的$xx.xx金额。以下是报表输出的示例(数字已更改):

1000000 Vendor1 USD PO Number 1/1/1900
Item1, Description 
100 Each $1.00

INV000000 1/1/1900 000 Each 100 0 $1.00 $24.00
0 0  $24.00

INV000001 1/1/1900 000 Each 50 0 $1.00 $10.50
0 0  $10.50

-------------------
Functional Amount Not Invoiced: $250.00
Amount Not Invoiced Less Returned: $250.00

1000001 Vendor2 USD PO2061994 6/2/2015
Item2, Description 30 Each $38.00

INV000002 7/23/2015 000 Each 9 0 $38.00 $342.00
0 0  $342.00

INV000003 7/23/2015 000 Each 7 0 $38.00 $266.00
0 0  $266.00

-------------------
Functional Amount Not Invoiced: $346,955.00
Amount Not Invoiced Less Returned: $1,245.00

我想知道如何解析一个.csv文件,以获取所有大于或等于$10,000.00的“未开票的功能金额”示例,并在这些情况下,获取前两个字符串并返回它们(在上面的情况下,我将返回1000000 Vendor 1)。以下是到目前为止我的代码:

companyList={'1000000':'Vendor1',...}

with open('Vendor Report.csv',mode='r',encoding='latin1') as file:
csvreader=csv.reader(file)
for row in csvreader:
    print(' '.join(row))
    if 'Functional Amount Not Invoiced:' in row:
        ...

我已经到了...部分,我知道逻辑是'如果字符串后的金额至少为$10,000.00,找到供应商ID和供应商名称并返回它们。目标是将所有超过$10,000.00的供应商列表自动追加到列表中。我的预期输出如下:

Vendor ID Vendor Name $346,955.00
...
uajslkp6

uajslkp61#

IIUC,这里有一个使用pandas的选项,使用read_fwfextract

#pip install pandas
import pandas as pd

MIN_AMOUNT = 10000

df = pd.read_fwf("input.csv", header=None)
​
vendor_vals = df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)", expand=False).ffill()
fani_vals = (df.pop(0).str.extract(r"Functional Amount Not Invoiced: \$(.*)",
                expand=False).replace({r",|\.0+": ""}, regex=True).astype(float))
​
companyList = (
                df.assign(VENDOR = vendor_vals, FANI = fani_vals).dropna()
                  .loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].to_dict("list")
               ) 
​

输出:

>>> print(companyList)

{'VENDOR': ['1000001 Vendor2'], 'FANI': [346955.0]}

更新

如果您需要一个 Dataframe (* 以生成.csv*),请使用以下命令:

df = pd.read_fwf("input.csv", header=None)
​
out = (
        df.join(df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)")
                .rename(columns={0: "VENDOR_ID", 1:"VENDOR_NAME"}).ffill())
          .assign(FANI = lambda df_: df_.pop(0).str.extract(r"Functional Amount Not Invoiced: \$(.*)",
                expand=False).replace({r",|\.0+": ""}, regex=True).astype(float))
          .dropna().loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].reset_index(drop=True)
       )

输出:

>>> print(out)

  VENDOR_ID VENDOR_NAME      FANI
0   1000001     Vendor2  346955.0

相关问题