如何在R中将当前周的数据与前一周的数据进行模式匹配

ibrsph3r  于 2023-05-11  发布在  其他
关注(0)|答案(2)|浏览(96)

我试图在一列中找到每周数据中的循环字符串,并希望创建一个列,如果模式匹配并且其中一个单元格包含前一周或未来一周的单词“Unidentified”,则返回1,否则返回0。我想知道如何创建列,“匹配”。谢谢你的帮助!

Week     Data              Match
 1       Red                 0
 2       Blue Unidentified   0
 3       Blue                1
 4       Blue                1
 5       Green               0
 6       Yellow              0
 7       Green               0
 8       Green Unidentified  1
 9       Green               1
 10      Yellow              0
 11      Red                 0
 12      Green               0
 13      Orange              0
 14      Orange              1
 15      Orange              1 
 16      Orange Unidentified 1
2ledvvac

2ledvvac1#

我很想看到更简洁的方法,但我希望这是明确的:

library(dplyr)
df %>%
  mutate(Data_unid = Data %>% stringr::str_remove(" Unidentified"),
         group = cumsum(Data_unid != lag(Data_unid, default = ""))) %>%
  mutate(has_unid = any(Data != Data_unid), .by = group) %>%
  mutate(match2 = 1 * (Data_unid == lag(Data_unid) & has_unid))

结果

Week                Data Match Data_unid group has_unid match2
1     1                 Red     0       Red     1    FALSE      0
2     2   Blue Unidentified     0      Blue     2     TRUE      0
3     3                Blue     1      Blue     2     TRUE      1
4     4                Blue     1      Blue     2     TRUE      1
5     5               Green     0     Green     3    FALSE      0
6     6              Yellow     0    Yellow     4    FALSE      0
7     7               Green     0     Green     5     TRUE      0
8     8  Green Unidentified     1     Green     5     TRUE      1
9     9               Green     1     Green     5     TRUE      1
10   10              Yellow     0    Yellow     6    FALSE      0
11   11                 Red     0       Red     7    FALSE      0
12   12               Green     0     Green     8    FALSE      0
13   13              Orange     0    Orange     9     TRUE      0
14   14              Orange     1    Orange     9     TRUE      1
15   15              Orange     1    Orange     9     TRUE      1
16   16 Orange Unidentified     1    Orange     9     TRUE      1
dy2hfwbg

dy2hfwbg2#

一种通用的方法,适用于任何匹配的单词。
首先将strsplitunnest_wider字符串转换为单词。
然后比较单词,看看是否有任何匹配的行,这是真正的前几周。最后替换生成的NA s并取消选择helper列。

library(dplyr)
library(tidyr)

df %>% 
  mutate(spl = strsplit(as.character(Data), " ")) %>% 
  unnest_wider(spl, names_sep = "_") %>% 
  mutate(Match = (if_else(c(0, diff(Week)) == 1, 
                    if_any(starts_with("spl"), ~ .x == lag(.x)), FALSE))*1,
         Match = replace_na(Match, 0)) %>% 
  select(-starts_with("spl"))
# A tibble: 16 × 3
    Week Data                Match
   <int> <chr>               <dbl>
 1     1 Red                     0
 2     2 Blue Unidentified       0
 3     3 Blue                    1
 4     4 Blue                    1
 5     5 Green                   0
 6     6 Yellow                  0
 7     7 Green                   0
 8     8 Green Unidentified      1
 9     9 Green                   1
10    10 Yellow                  0
11    11 Red                     0
12    12 Green                   0
13    13 Orange                  0
14    14 Orange                  1
15    15 Orange                  1
16    16 Orange Unidentified     1

相关问题