我使用递归函数通过RegEx匹配生成文本,它根据方括号(pattern = '\[.*?\]'
)中的同义词组合查找单词模式,方括号(pattern = '\[.*?\]'
)由字符串分隔符(我定义了_STRING_SEPARATOR =#lkmkmksdmf###
)分隔。)
函数的初始语句参数类似于:[decreasing#lkmkmksdmf###shrinking#lkmkmksdmf###falling#lkmkmksdmf###contracting#lkmkmksdmf###faltering#lkmkmksdmf###the contraction in] exports of services will drive national economy to a 0.3% real GDP [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2023 from an estimated 5.0% [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2022
个
和
该函数如下所示:
def all_combinations(self,sentence,sentence_list:list):
pattern = '\[.*?\]'
if not re.findall(pattern, sentence, flags = re.IGNORECASE):
if sentence not in sentence_list:
sentence_list.append(sentence)
else:
for single_match in re.finditer(pattern, sentence, flags = re.IGNORECASE):
repl=single_match.group(0)[1:-1]
start_span = single_match.span()[0]
end_span = single_match.span()[1]
for candidate_word in repl.split(self._STRING_SEPARATOR):
tmp_sentence = (
sentence[0: start_span] +
candidate_word +
sentence[end_span:]
)
new_sentence = deepcopy(tmp_sentence)
self.all_combinations(new_sentence,sentence_list)
字符串
因此,sentence_list
变量像DFS树一样不断追加句子,sentence_list
中的连续句子如下所示:
0: "decreasing exports in services will drive national economy to a 0.5% real GDP decline in 2023 from an estimated 5.0% decline in 2022"
1: "decreasing exports in services will drive national economy to a 0.5% real GDP decline in 2023 from an estimated 5.0% decrease in 2022"
2: "decreasing exports in services will drive national economy to a 0.5% real GDP decline in 2023 from an estimated 5.0% contraction in 2022"
3: "decreasing exports in services will drive national economy to a 0.5% real GDP decrease in 2023 from an estimated 5.0% decline in 2022"
4: "decreasing exports in services will drive national economy to a 0.5% real GDP decrease in 2023 from an estimated 5.0% decrease in 2022"
型
等等......
我想避免两次使用相同的单词--例如,如果我使用了单词“decline”,那么在递归调用后的内部for循环中选择下一组单词时,就不应该再次使用它。当第二个方括号模式中的单词被解析时,有没有一种方法可以“存储”第一个方括号中的单词所使用的单词,等等?
- 它就像一个DFS树,其中每个节点都必须存储其父节点的状态。* 如何修改函数,使sentence_list的单个句子中不再使用相同的单词?
我尝试使用一个名为“avoid_words”的参数:将“list”添加到all_combinations,all_combinations将存储父节点字的列表。但是,当我必须从第一个方括号(或从不同的“根”开始)移动到下一个单词时,我如何删除它?
1条答案
按热度按时间b4lqfgs41#
正如Tim所指出的,如果真的没有其他方法来输入字符串和它的参数(我对此表示怀疑),你应该使用
split()
函数将初始句子分为单词(同义词)和纯句子。Bellow是我会使用的注解代码,如果我必须解决这样的情况。
字符串