我用readr:: read_tsv ("file.txt", show_col_types = F)在R中读取了一个连续文本文件作为 Dataframe (df)。结果df有1列和80,398行。我想过滤并保留包含"Run = \\d{1,3}"的行,但前提是在另外8行之后后面跟着".*Final Intermolecular Energy ="。有人能给我点光吗?
readr:: read_tsv ("file.txt", show_col_types = F)
"Run = \\d{1,3}"
".*Final Intermolecular Energy ="
k97glaaz1#
一个选项是使用两个标准filter(),一个用于“Run = \d{1,3}”,另一个用于“Final Intermolecular...”,使用来自dplyr包的lead() function,以确保“Final Intermolecular...”在“Run = \d{1,3}”之前8行,例如
filter()
lead()
library(tidyverse) df <- data.frame(x = c(0, "Run = 123", 1, 2, 3, 4, 5, 6, 7, 8, "xxyyzz Final Intermolecular Energy = 123", 0, 0, "Run = 345", 1, 2, 3, 4, 5, 6, 7, 8, "not final energy", 0)) df #> x #> 1 0 #> 2 Run = 123 #> 3 1 #> 4 2 #> 5 3 #> 6 4 #> 7 5 #> 8 6 #> 9 7 #> 10 8 #> 11 xxyyzz Final Intermolecular Energy = 123 #> 12 0 #> 13 0 #> 14 Run = 345 #> 15 1 #> 16 2 #> 17 3 #> 18 4 #> 19 5 #> 20 6 #> 21 7 #> 22 8 #> 23 not final energy #> 24 0 df %>% filter(str_detect(x, "Run = \\d{1,3}") & str_detect(lead(x, n = 9), ".*Final Intermolecular Energy =")) #> x #> 1 Run = 123 # doesn't detect "Run = 345" as it doesn't match the second criteria
创建于2023-05-31带有reprex v2.0.2
1条答案
按热度按时间k97glaaz1#
一个选项是使用两个标准
filter()
,一个用于“Run = \d{1,3}”,另一个用于“Final Intermolecular...”,使用来自dplyr包的lead()
function,以确保“Final Intermolecular...”在“Run = \d{1,3}”之前8行,例如创建于2023-05-31带有reprex v2.0.2