如何将str_detect和lag函数配对以一次查看多行

5kgi1eie  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(132)

我已经有函数提供滞后到一行。我的问题是,我需要找到下一个示例(之前或之后),其中strdetect=TRUE如果它不在紧接的之前或之后。(例如:TRUE发生在第三行而不是第一行)。我正在使用以下软件包

library(tidytext)
library(dplyr)
library(stringr)

这是我用来查看一行后面的函数:

#function to look behind one (
r_sen_from_states_alabama <- function(x){
  ifelse(str_detect(x, regex("the senator from Alabama",ignore_case = TRUE))==TRUE & str_detect(lag(x),"^Mr\\. SHELBY \\(R; Alabama\\):")==TRUE,
         str_replace(x,regex("the senator from Alabama", ignore_case = TRUE),"senator Shelby"),
         ifelse(str_detect(x, regex("the senator from Alabama", ignore_case = TRUE))==TRUE & str_detect(lag(x),"^Mr\\. SESSIONS \\(R; Alabama\\):")==TRUE,
                str_replace(x, regex("the senator from Alabama",ignore_case = TRUE), "senator Sessions"),x))
}

这是我用来向前看一行的函数:

#function to look ahead one
r_sen_from_states_alabama <- function(x){
  ifelse(str_detect(x, regex("the senator from Alabama",ignore_case = TRUE))==TRUE & str_detect(lead(x),"^Mr\\. SHELBY \\(R; Alabama\\):")==TRUE,
         str_replace(x,regex("the senator from Alabama", ignore_case = TRUE),"senator Shelby"),
         ifelse(str_detect(x, regex("the senator from Alabama", ignore_case = TRUE))==TRUE & str_detect(lead(x),"^Mr\\. SESSIONS \\(R; Alabama\\):")==TRUE,
                str_replace(x, regex("the senator from Alabama",ignore_case = TRUE), "senator Sessions"),x))
}
test_df_try_1 <- as.data.frame(r_sen_from_states_alabama(test_df))

下面是一些用于复制/调试目的的示例数据。

#testing env data
test_col <- c("Mr. SHELBY (R; Alabama): I acknowledge this is a test.",
              "Mrs. MURRAY (D; Washington): I say to my friend, the senator from Alabama, that they are wrong.",
              "Mr. SHELBY (R; Alabama): I do not agree with my colleague.",
              "Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.",
              "Mr. SHELBY (R; Alabama): I thank the majority leader for their support.",
              "Mr. SESSIONS (R; Alabama): I am proud of my junior, the senator from Alabama.",
              "Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.",
              "Ms. PEABODY (I; Atlantis): This is just to fill space.",
              "Mr. Quail (Q; Nowhere): Match this one, the senator from Alabama, with the senator at the bottom.",
              "Ms. PEABODY (I; Atlantis): This is just to fill space.",
              "Ms. PEABODY (I; Atlantis): This is just to fill space!",
              "Ms. PEABODY (I; Atlantis): This is just to fill space?",
              "Mr. SESSIONS (R; Alabama): Match this one with Quail.")

test_df <- data.frame(test_col)
colnames(test_df) <- c("speeches")

已编辑以包括预期输出:

#Desired hypothetical results when using lag [note: 9 went lag(n = 2)
1   Mr. SHELBY (R; Alabama): I acknowledge this is a test.
2   Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong.
3   Mr. SHELBY (R; Alabama): I do not agree with my colleague.
4   Mr. FRIST (R; Tennessee): senator Shelby is correct, senator Murray.
5   Mr. SHELBY (R; Alabama): I thank the majority leader for their support.
6   Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby.
7   Mr. SHELBY (R; Alabama): To my senior peer, senator Sessions, I say great things.
8   Ms. PEABODY (I; Atlantis): This is just to fill space.
9   Mr. Quail (Q; Nowhere): Match this one, senator Shelby, with the senator before.
10  Ms. PEABODY (I; Atlantis): This is just to fill space.
11  Ms. PEABODY (I; Atlantis): This is just to fill space!
12  Ms. PEABODY (I; Atlantis): This is just to fill space?
13  Mr. SESSIONS (R; Alabama): Match this one with Quail.

#Desired hypothetical results when using lead [note: 6 went lead(n=7, 9 went lead(n=4)]
1  Mr. SHELBY (R; Alabama): I acknowledge this is a test.
2  Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong.
3  Mr. SHELBY (R; Alabama): I do not agree with my colleague.
4  Mr. FRIST (R; Tennessee): senator Shelby is correct, senator Murray.
5  Mr. SHELBY (R; Alabama): I thank the majority leader for their support.
6  Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby.
7  Mr. SHELBY (R; Alabama): To my senior peer, senator Sessions, I say great things.
8  Ms. PEABODY (I; Atlantis): This is just to fill space.
9  Mr. Quail (Q; Nowhere): Match this one, senator Sessions, with the senator at the bottom.
10 Ms. PEABODY (I; Atlantis): This is just to fill space.
11 Ms. PEABODY (I; Atlantis): This is just to fill space!
12 Ms. PEABODY (I; Atlantis): This is just to fill space?
13 Mr. SESSIONS (R; Alabama): Match this one with Quail.
rta7y2nd

rta7y2nd1#

这是一种不同的方法。我用case_when。你能确定这就是你要找的吗?
我使用any同时查看滞后和超前。我还在当前行中寻找参议员的名字,以防止参议员提到他们自己。(这是否定的。)
然而,这实际上使用了您的代码,只是安排有点不同。如果你有任何问题请告诉我。(我不认为我需要解释你的代码。但如果有什么不清楚或者你不熟悉case_when,我很乐意为你解释。

tdf = test_df %>% 
  mutate(tellMe = case_when(
    str_detect(speeches, regex("senator from Alabama", ignore_case = T)) & 
      str_detect(speeches, regex("SHELBY \\(R; Alabama\\)", ignore_case = T), negate = T) &
      any(str_detect(lag(speeches), "SHELBY \\(R; Alabama\\)"),
          str_detect(lead(speeches), "SHELBY \\(R; Alabama\\)")) ~
      str_replace(speeches, regex("the senator from Alabama", ignore_case = T),
                  "senator Shelby"),
    str_detect(speeches, regex("senator from Alabama", ignore_case = T)) & 
      str_detect(speeches, regex("SESSIONS \\(R; Alabama\\)", ignore_case = T), negate = T) &
      any(str_detect(lag(speeches), "SESSIONS \\(R; Alabama\\)"),
          str_detect(lead(speeches), "SESSIONS \\(R; Alabama\\)")) ~
      str_replace(speeches, regex("the senator from Alabama", ignore_case = T),
                  "senator Sessions"),
    TRUE ~ speeches
  ))

1                                   Mr. SHELBY (R; Alabama): I acknowledge this is a test.
2    Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong.
3                               Mr. SHELBY (R; Alabama): I do not agree with my colleague.
4                     Mr. FRIST (R; Tennessee): senator Shelby is correct, senator Murray.
5                  Mr. SHELBY (R; Alabama): I thank the majority leader for their support.
6                      Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby.
7        Mr. SHELBY (R; Alabama): To my senior peer, senator Sessions, I say great things.
8                                   Ms. PEABODY (I; Atlantis): This is just to fill space.
9  Mr. Quail (Q; Nowhere): Match this one, senator Shelby, with the senator at the bottom.
10                                  Ms. PEABODY (I; Atlantis): This is just to fill space.
11                                  Ms. PEABODY (I; Atlantis): This is just to fill space!
12                                  Ms. PEABODY (I; Atlantis): This is just to fill space?
13                                   Mr. SESSIONS (R; Alabama): Match this one with Quail.
>

相关问题