regex 使用正则表达式从字符串列表中提取信息

4szc88ey 于 2023-04-22 发布在其他

关注(0)|答案(1)|浏览(115)

我有一个字符串列表，我希望从中提取有关金额，百分比等的信息。作为regex的新手，我一直在努力处理这个过程。下面是我的输入和期望的输出以及我尝试使用的代码。
输入列表：

['0.09% of the first GBP£250 million of the Company’s Net Asset Value;', '0.08% of the next GBP£250 million of the Company’s Net Asset Value;', "0.06% of the next GBP£500 million of the Company's Net Asset Value; and", 'e GBP£22,000 in respect of cach of (he Company’s Sub-Funds which shall be accrued for on a daily basis', 'in accordance with the formula GBP£22,000 + 365, Minimum fee to be levied at a Company level,', 'e Preparation of fund interim and annual financial statements... GBP£2,750 per sub-fund pa', 'e UK Tax Reporting... ww. GBPE£L,500 per sub-fund pa', 'BUSD Tax Reporting’ v GBP£3,000 per sub-find pa', '© Account maintenance £00 sess resect GBPL£25 per investor pa', '» Manual .. GBPE£25 per transaction', '"Automated GBPE£S5 per Gransaction', 'e Investor registration and AML {ce GBP£50 per new investor account,', '« Fund distribution/dividend fee GBP£750 per distribution/dividend per sub fund.']

验证码：

import re

def extract_pounds(text):
    regex = "£(\w+)"
    return re.findall(regex, str(text))

for word in empty_df:
    pounds = extract_pounds(word)
    print(pounds)

我得到了以下输出，它远远没有接近我想要的输出：

['250']
['250']
['500']
['22']
['22']

所需输出：

Tier    Amount   Minimum Fee Sub-Fund               AccountMaintain
 0.09%   first GBP£250 million   GBP£22,000  GBP£2,750   £00 sess 
                                                       resect GBPL£25 
                                                       Manual GBPE£25 
                                                     AutomatedGBPE£S5 

 0.08%   next GBP£250 million                        GBPE£L,500
 0.06%   next GBP£500 million                        GBP£3,000

regex

来源：https://stackoverflow.com/questions/76052167/extracting-information-from-a-list-of-strings-using-regex

1条答案

按热度按时间

r7knjye21#

使用pandas，您可以尝试以下操作：

import re
import pandas 

pat = r"([\d.]+%) of the (\w+ GBP£\d+ \w+)"
df = pd.Series(lst[:-1]).str.extract(pat).set_axis(["Tier", "Amount"], axis=1)

df.loc[0, "Minimum Fee"] = re.search("GBP£\d+,\d+", lst[-1]).group(0)

输出：

print(df)

    Tier                 Amount Minimum Fee
0  0.09%  first GBP£250 million  GBP£22,000
1  0.08%   next GBP£250 million         NaN
2  0.06%   next GBP£500 million         NaN

更新：

根据您更新的问题/列表，使用此：

pat1 = r"([\d.]+%) of the (\w+ GBP£\d+ \w+)"
df = pd.Series(lst).str.extract(pat1).set_axis(["Tier", "Amount"], axis=1).dropna()

pat2 = r"(GBP£\d+,\d+).*Minimum fee"
result = re.search(pattern, " ".join(lst))
mfee = result.group(1) if result else None
    
df.loc[0, "Minimum Fee"] = mfee

输出：

print(df)

    Tier                 Amount Minimum Fee
0  0.09%  first GBP£250 million  GBP£22,000
1  0.08%   next GBP£250 million         NaN
2  0.06%   next GBP£500 million         NaN

赞(0）回复(0）举报 2023-04-22

我来回答

regex 使用正则表达式从字符串列表中提取信息

1条答案

相关问题

热门标签

最新问答