用正则表达式从Python输入语句中提取命名实体

t9aqgxwy  于 2023-02-15  发布在  Python
关注(0)|答案(1)|浏览(108)

假设我有一些条件

"Open Youtube"
"Install PlayStore App"
"Go to Call of Duty app"

现在我有了一个rules.list文件,其中包含从上述命令中提取命名实体的所有规则。
假设rules.list文件的内容如下所示

app install (.*)    1
app install app (.*)    1
app install the (.*) app    1
app uninstall the app (.*)  1
app uninstall app (.*)  1
app uninstall the (.*) app  1
app go to (.*) app  1
app download (.*)   1
app download (.*) app   1
app download app (.*)   1
app download the app (.*)   1
app download the (.*) app   1
app install the app (.*)    1
app open the (.*) app   1
app open (.*)   1
app uninstall (.*)  1
app launch (.*) app 1
app launch (.*) 1

我有没有办法在python中使用这个rules.list文件从我的句子中提取命名实体,这样我就有了YoutubePlayStoreCall of Duty作为我的输出?

waxmsbnn

waxmsbnn1#

如果你把规则从开始的“app“和结束的“1”去掉,那么你会得到一个重新表达的表达式。(.*)将返回一个包含所需值的组。
有点棘手的是字符串中使用的大写字母,但规则中没有,因此我在使用re之前将字符串变为小写。

rules = [
    "app install (.*)    1",
    "app install app (.*)    1",
    "app install the (.*) app    1",
    "app uninstall the app (.*)  1",
    "app uninstall app (.*)  1",
    "app uninstall the (.*) app  1",
    "app go to (.*) app  1",
    "app install the app (.*)    1",
    "app open the (.*) app   1",
    "app open (.*)   1",
    "app launch (.*) 1",
    ]

for rule in rules:
    rule = rule[4:-1].strip()
    # print(rule)

    for string in strings:

        result = re.search(rule, string.lower())

        if result:
            print('-----------------------------')
            print(f'rule   - {rule}')
            print(f'string - {string}')
            print(f'result - {result.group(1)}')

产出

-----------------------------
rule   - install (.*)
string - Install PlayStore App
result - playstore app
-----------------------------
rule   - go to (.*) app
string - Go to Call of Duty app
result - call of duty
-----------------------------
rule   - open (.*)
string - Open Youtube
result - youtube

我想这应该能让你开始。

相关问题