regex 匹配第三次出现的句点后跟空格的正则表达式是什么?

edqdpe6u  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(122)

我正在尝试将巨大的文本块分解成可读的文本。我想通过在句点的第三个示例后面插入一个换行符来实现这一点。(或者,更准确地说,我想用". \n"替换每三次出现的". "
我在ChatGPT上运行了大约一个小时的查询,得到了十几个错误的答案。是“第三次出现”这部分让我很困惑。我知道如何要求任何发生,但不是第n次发生。
我通常在SublimeText中运行正则表达式。
失败的模式:

(?:(?:[^.\n]*[.]){2}[^.\n]*[.][ \n])

(?:[^\.\n]*\.[^\.\n]*){2} [^\.\n]*

(?<=\..*\..*\..*)\.

字符串
诸如此类。
这是一个输入文本示例,它应该匹配每三个句点后跟空格的示例:

A person under pressure will do things that he or she might not do under normal circumstances. If a person is threatened with losing his home or his family, he may turn to fraud as a means to relieve that financial pressure. Often these individuals have been with the organization for many years and occupy positions of extreme trust. These individuals can be called accidental fraudsters. They are seemingly law abiding, honest people, but when faced with extreme financial pressure, they turn to fraud. This segment will begin by defining some of the basic elements of fraud. We will also discuss the cost of fraud and the importance of understanding how it occurs. We will examine some of the leading theories on why people commit fraud and how that information can be used to help us prevent it.


此示例文本将不匹配(除一个句点外,其他所有句点都已删除):

The difference is that criminal cases must meet a higher burden of proof For example, an employee steals $100,000 from his employer by setting up a phony company and submitting false invoices for services that are not performed That conduct is criminal because he's stealing funds through deception, but the company has also been injured the employee's actions and can sue in civil court to get its money back One of the largest causes of fraud involves asset misappropriations Asset misappropriation is simply the theft or misuse of an organization's assets. Common examples include skimming revenues, stealing inventory, obtaining fraudulent payments, and payroll fraud Corruption entails the wrongful or unlawful misuse of influence in a business transaction to procure a personal benefit contrary to an individual's duty to their employer or the rights of another Common examples include accepting kickbacks, demanding extortion or engaging in conflicts of interest. Financial statement fraud involves the intentional misrepresentation of financial or nonfinancial information to mislead others who are relying on it to make economic decisions

u5i3ibmn

u5i3ibmn1#

Sublime Text使用Boost库中的Perl兼容正则表达式(PCRE)引擎。因此,您可以使用以下正则表达式将每三次出现的". "替换为". \n"

(?:(?:(?!\. ).)*\. ){2}(?:(?!\. ).)*\K\.\s

字符串
设置了g(“global,do not return after first match”)和s(“dot matches newline”)标志。
Demo
正则表达式可以分解如下。

(?:            # begin a non-capture group
  (?:          # begin a non-capture group
    (?!\.[ ])  # negative lookahead asserts that the following two
               # characters are not ". "    
    .          # match any character
  )*           # end inner non-capture group and execute it zero or more times
  \.[ ]        # match ". "
){2}           # end the outer non-capture group and execute it twice
(?:            # begin a non-capture group
  (?!\.[ ])    # negative lookahead asserts that the following two
               # characters are not ". "    
  .            # match any character
)*             # end non-capture group and executed zero or more times
\K             # reset string pointer to current location and discard all
               # previously-matched characters
\.[ ]          # match third instance of ". "


请注意,在上面我用一个包含空格的字符类([ ])替换了每个空格,只是为了使空格可见。
您可能还会发现,将光标悬停在演示链接中正则表达式的每个部分上,以获得其功能的解释也很有帮助。
该表达式

(?:(?!\. ).)


匹配任何单个字符,只要它不是句点,后面也没有空格(如负先行(?!\. )所要求的)。
这种构造有时被称为回火贪婪令牌解决方案。
或者,用换行符替换以下正则表达式的(零宽度)匹配项。

(?:(?:(?!\. ).)*\. ){2}(?:(?!\. ).)*\. \K


Demo

sxissh06

sxissh062#

您缺少.后面的空格

Find: (?:.*?\.\s){3}
Replace with: $&\n

字符串
替换文本中的$&表示regexp匹配的所有内容。

相关问题