regex 从文本中提取特殊字符串

41zrol4v 于 2022-11-18 发布在其他

关注(0)|答案(3)|浏览(184)

我的问题如下：

data_example <-
  c("Creditshelf Aktiengesellschaft / Key word(s): Forecast/Development of Sales\n\ncreditshelf Aktiengesellschaft",
    "Swiss Life Holding AG / Key word(s): 9 Month figures\n\nSwiss Life increases fee income by 13%",
    "tonies SE / Key word(s): Capital Increase\n\ntonies SE: tonies successfully places 12,000,000 new class A shares",
    "init innovation in traffic systems SE / Key word(s): Contract/Incoming Orders\n\ninit innovation in traffic systems SEs")
strings_to_extract <-
  c("Key word(s): Word1/Word2",
    "Key word(s): Word1/Word2 Word3",
    "Key word(s): Word1 Word2 Word3",
    "Key word(s): Word1/Word2/Word3",
    "Key word(s): Number Word1/Word2",
    "Key word(s): Number Word1 Word2",
    "Key word(s): Word1 Number Word2")

总是会有一个空格或“/”来分隔它们。我的尝试看起来像这样：

str_extract(data, "Key word[[:punct:]]{1}s[[:punct:]]{2} [[:alpha:]]{1,}|Key word[[:punct:]]{1}s[[:punct:]]{2} [[:alpha:]]{1,}[[:punct:]]{1,}[[:alpha:]]{1,}Key word[[:punct:]]{1}s[[:punct:]]{2} [[:alpha:]]{1,}[[:punct:]]{1,}[[:alpha:]]{1,}[[:punct:]]{1,}[[:alpha:]]{1,}")

我的意思是我抓住了主题的一个很好的部分，但是我认为它太复杂了。有人能给予我一个建议如何做得更好吗？
泰国和韩国

regex

来源：https://stackoverflow.com/questions/74376039/regrex-extract-special-strings-from-text

3条答案

按热度按时间

mkh04yzy1#

您可以使用

str_extract(data, "Key word\\(s\\):\\s*\\w+(?:\\W+\w+){1,2}")

请参阅regex demo。

详细数据 *：
Key word\(s\):
\s*-零个或多个空格
\w+-一个或多个字字符
(?:\W+\w+){1,2}-一个或两个由一个或多个非字字符组成的序列，后跟一个或多个字字符。

赞(0）回复(0）举报 2022-11-18

j9per5c42#

您的示例数据也适合使用不同的方法，因为您的关键字总是以\n结尾。
在这种情况下，您可以执行以下操作：

data_example <-
c("Creditshelf Aktiengesellschaft / Key word(s): Forecast/Development of Sales\n\ncreditshelf Aktiengesellschaft",
  "Swiss Life Holding AG / Key word(s): 9 Month figures\n\nSwiss Life increases fee income by 13%",
  "tonies SE / Key word(s): Capital Increase\n\ntonies SE: tonies successfully places 12,000,000 new class A shares",
  "init innovation in traffic systems SE / Key word(s): Contract/Incoming Orders\n\ninit innovation in traffic systems SEs")

stringr::str_extract(data_example, "Key word\\(s\\):.+(?=\\n)")
#> [1] "Key word(s): Forecast/Development of Sales"
#> [2] "Key word(s): 9 Month figures"              
#> [3] "Key word(s): Capital Increase"             
#> [4] "Key word(s): Contract/Incoming Orders"

Key word\\(s\\):与Key word(s):匹配，.+(?=\\n)与所有字符匹配：由\n接续的.+：注意R中需要的双转义（\\）。

赞(0）回复(0）举报 2022-11-18

qnzebej03#

如果您不想包含短语“关键字：“，则可以执行以下操作：

data_example <-
  c("Creditshelf Aktiengesellschaft / Key word(s): Forecast/Development of Sales\n\ncreditshelf Aktiengesellschaft",
    "Swiss Life Holding AG / Key word(s): 9 Month figures\n\nSwiss Life increases fee income by 13%",
    "tonies SE / Key word(s): Capital Increase\n\ntonies SE: tonies successfully places 12,000,000 new class A shares",
    "init innovation in traffic systems SE / Key word(s): Contract/Incoming Orders\n\ninit innovation in traffic systems SEs")

stringr::str_extract(string = data_example,
                     pattern = '(?<=Key word\\(s\\): )[\\s\\S]+')

#> [1] "Forecast/Development of Sales\n\ncreditshelf Aktiengesellschaft"                        
#> [2] "9 Month figures\n\nSwiss Life increases fee income by 13%"                              
#> [3] "Capital Increase\n\ntonies SE: tonies successfully places 12,000,000 new class A shares"
#> [4] "Contract/Incoming Orders\n\ninit innovation in traffic systems SEs"

赞(0）回复(0）举报 2022-11-18

我来回答

regex 从文本中提取特殊字符串

3条答案

相关问题

热门标签

最新问答