regex 在R中提取特定字符串和冒号之间的值

lkaoscv7 于 2023-06-25 发布在其他

关注(0)|答案(1)|浏览(131)

我有一个这样的表示例

No, Memo
  1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

我想提取Date:，City:和Note:之后的字符串。例如，在NO。1，我需要提取Date:和City:之间的“2020/10/22”，City:和Note:之间的“UA”，以及Note:之后的“真正掌握任何技能都需要一辈子”。
所需输出，如：

No Date       City Note
  1 2020/10/22 UA   True mastery of any skill takes a lifetime.
  2 2022/11/01 CH   Sweat is the lubricant of success.
  3 2022y11m1d UA   Every noble work is at first impossible.
  4 2022y2m15d AA   Live beautifully, dream passionately, love completely.

有人知道答案吗？任何帮助都是很好的。谢谢。

regex

来源：https://stackoverflow.com/questions/76504262/extract-value-between-specific-string-and-colon-in-r

1条答案

按热度按时间

raogr8fs1#

我使用regex和stringr和dplyr的解决方案

library(stringr)
library(dplyr)

df <- read.table(
  text = "No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
  sep = ";",
  header = T
)

df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
                         city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
                         note = str_extract(Memo, "(?<=Note: ).*")) %>%
  select(-Memo)

> df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

正则表达式匹配使用正向lookahead和loohbehind指定的组之间的所有内容。

赞(0）回复(0）举报 2023-06-25

我来回答

regex 在R中提取特定字符串和冒号之间的值

1条答案

相关问题

热门标签

最新问答