regex 如何提取子节后的文本：[本文]

xjreopfe 于 2022-12-27 发布在其他

关注(0)|答案(3)|浏览(137)

I.  Text \n    A. Sub-section 1: This text\n    B. Sub-section 2: This text\n    C. Sub-section 3: This text\n\n II. text\n    A. Sub-section 1: This text \n    B. Sub-section 2: This text\n III. text \n     A.Sub-section 1: This text\n

我希望可以提取sub-section之后的文本并将其添加到数组中
输入为长字符串
任何解决方案都可以使用正则表达式或字符串操作

regex

来源：https://stackoverflow.com/questions/74925004/how-do-i-extract-text-after-the-sub-section-this-text

3条答案

按热度按时间

aydmsdu91#

如果您将所有文本放在一个变量中，则可以首先使用matchAll()获取所有匹配项，然后使用map()仅缩小到捕获组，再使用trim()去除不必要的空格。”

const text = ` I.Text 
    A. Sub-section 1: This text
    B. Sub-section 2: This text
    C. Sub-section 3: This text

II. text
    A. Sub-section 1: This text 
    B. Sub-section 2: This text
III. text 
    A.Sub-section 1: This text `

const regexp = /Sub-section \d+: (.*)/g;

const array = [...text.matchAll(regexp)];
const filtered_array = array.map(el => el[1].trim())
console.log(filtered_array)

如果你有你的文字在不同的形式，让我知道在评论中，我可以改变相应的代码。

赞(0）回复(0）举报 2022-12-27

xam8gpfp2#

考虑到我们有一个像text这样的多行文本，我们可以通过搜索Sub-section \d+:来获取每行

"\d +"= 1位或更多*

在这之后，我们得到了每个子部分的数组。
下一步是从我们得到的所有文本中删除Sub-section \d+:。

- 产出**

// multiline text
    let text = `I.Text 
    A. Sub-section 1: This text
    B. Sub-section 2: This text
    C. Sub-section 3: This text

II. text
    A. Sub-section 1: This text 
    B. Sub-section 2: This text
III. text 
     A.Sub-section 1: This text`,

    // get Sub-section parts until end of line
    sub_sections = text.match( /Sub-section\s?\d+:[^\n]*/gi ),

    // get only text of Sub-section
    output = sub_sections.map( sub => 
        sub.replace( /Sub-section\s?\d+:/i, "" ).trim()
    )
    
    console.log( output )

赞(0）回复(0）举报 2022-12-27

dkqlctbz3#

您的问题没有很好地定义。以下是基于以下假设的解决方案：

您有带节的行，每个节都有标题和子节
子节具有前导空格
要提取:冒号之后的文本，冒号前后的文本可能会有所不同
结果应该是冒号后所有子部分文本的数组

const input = ` I.Text 
    A. Sub-section 1: This text I.A
    B. Sub-section 2: This text I.B
    C. Sub-section 3: This text I.C

II. text
    A. Sub-section 1: This text II.A
    B. Sub-section 2: This text II.B
III. text 
    A.Sub-section 1: This text III.A
`;
const regex = /^ .*?: *(.+)/gm;
const result = [...input.matchAll(regex)].map(m => m[1]);
console.log(result);

输出：

[
  "This text I.A",
  "This text I.B",
  "This text I.C",
  "This text II.A",
  "This text II.B",
  "This text III.A"
]

正则表达式的解释：

^-行的开始
``--应为空格
.*?:-第一个结肠的非贪婪扫描
*-可选空格
(.+)-捕获组1：到行尾的所有内容，至少包含一个字符
gm-标记为macth multiple，并将行的开始/结束视为字符串的开始/结束
如果假设不正确，可以根据需要调整正则表达式

赞(0）回复(0）举报 2022-12-27

我来回答

regex 如何提取子节后的文本：[本文]

3条答案

相关问题

热门标签

最新问答