计算R中 Dataframe 中两列内容相同和不同的次数[重复]

ff29svar  于 2023-02-06  发布在  其他
关注(0)|答案(3)|浏览(176)
    • 此问题在此处已有答案**:

Check column if contains value from another column(4个答案)
4天前关闭。
我有这个 Dataframe

df <- structure(list(`Prediction (Ge)` = c("Paranthropus", "Paranthropus", 
"Homo", "Paranthropus", "Australopithecus", "Paranthropus", "Paranthropus", 
"Australopithecus", "Paranthropus", "Australopithecus", "Paranthropus", 
"Australopithecus", "Australopithecus", "Australopithecus", "Australopithecus", 
"Paranthropus", "Homo", "Australopithecus", "Paranthropus", "Paranthropus", 
"Paranthropus", "Paranthropus", "Australopithecus", "Paranthropus", 
"Australopithecus", "Paranthropus", "Australopithecus"), `Prediction (Sp)` = c("Australopithecus africanus", 
"Paranthropus robustus", "Paranthropus boisei", "Paranthropus robustus", 
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus", 
"Australopithecus afarensis", "Paranthropus boisei", "Paranthropus robustus", 
"Paranthropus robustus", "Paranthropus robustus", "Australopithecus afarensis", 
"Australopithecus afarensis", "Australopithecus afarensis", "Paranthropus robustus", 
"Homo habilis", "Australopithecus afarensis", "Paranthropus robustus", 
"Paranthropus boisei", "Paranthropus boisei", "Paranthropus robustus", 
"Australopithecus afarensis", "Paranthropus robustus", "Australopithecus afarensis", 
"Paranthropus robustus", "Australopithecus afarensis")), row.names = c(2L, 
3L, 6L, 7L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 19L, 20L, 26L, 
27L, 28L, 29L, 30L, 31L, 32L, 34L, 35L, 37L, 38L, 42L, 46L, 47L
), class = "data.frame", na.action = structure(c(`1` = 1L, `4` = 4L, 
`5` = 5L, `8` = 8L, `16` = 16L, `17` = 17L, `18` = 18L, `21` = 21L, 
`22` = 22L, `23` = 23L, `24` = 24L, `25` = 25L, `33` = 33L, `36` = 36L, 
`39` = 39L, `40` = 40L, `41` = 41L, `43` = 43L, `44` = 44L, `45` = 45L
), class = "omit"))

head(df)允许可视化其外观:

head(df)
    Prediction (Ge)            Prediction (Sp)
2      Paranthropus Australopithecus africanus
3      Paranthropus      Paranthropus robustus
6              Homo        Paranthropus boisei
7      Paranthropus      Paranthropus robustus
9  Australopithecus      Paranthropus robustus
10     Paranthropus      Paranthropus robustus

有两列,它们来自两个不同的预测。
我想知道的是,第二列中的亏格(Prediction (Sp)是否与Prediction (Ge)中的亏格相同,这意味着我们需要比较Prediction (Sp)中的第一个单词与Prediction (Ge)中的值。
如果只分析head(df)的前6行,我会说有3行是相同的(行号3、7和10),而有3行是不同的(2、6、9)。
我怎样才能用一行简单的代码来获得相同/不同值的总数呢?

2g32fytz

2g32fytz1#

使用grepl分别应用于每一行。不使用任何包。

subset(df, mapply(grepl, `Prediction (Ge)`, `Prediction (Sp)`))
##     Prediction (Ge)            Prediction (Sp)
## 3      Paranthropus      Paranthropus robustus
## 7      Paranthropus      Paranthropus robustus
## 10     Paranthropus      Paranthropus robustus
## ...snip...

table(with(df, mapply(grepl, `Prediction (Ge)`, `Prediction (Sp)`)))
##
## FALSE  TRUE 
##     5    22
bqucvtff

bqucvtff2#

不如这样:

library(dplyr)
library(stringr)

df %>% 
  mutate(right_genus = str_detect(`Prediction (Sp)`, `Prediction (Ge)`)) 
#>     Prediction (Ge)            Prediction (Sp) right_genus
#> 2      Paranthropus Australopithecus africanus       FALSE
#> 3      Paranthropus      Paranthropus robustus        TRUE
#> 6              Homo        Paranthropus boisei       FALSE
#> 7      Paranthropus      Paranthropus robustus        TRUE
#> 9  Australopithecus      Paranthropus robustus       FALSE
#> 10     Paranthropus      Paranthropus robustus        TRUE
#> 11     Paranthropus      Paranthropus robustus        TRUE
#> 12 Australopithecus Australopithecus afarensis        TRUE
#> 13     Paranthropus        Paranthropus boisei        TRUE
#> 14 Australopithecus      Paranthropus robustus       FALSE
#> 15     Paranthropus      Paranthropus robustus        TRUE
#> 19 Australopithecus      Paranthropus robustus       FALSE
#> 20 Australopithecus Australopithecus afarensis        TRUE
#> 26 Australopithecus Australopithecus afarensis        TRUE
#> 27 Australopithecus Australopithecus afarensis        TRUE
#> 28     Paranthropus      Paranthropus robustus        TRUE
#> 29             Homo               Homo habilis        TRUE
#> 30 Australopithecus Australopithecus afarensis        TRUE
#> 31     Paranthropus      Paranthropus robustus        TRUE
#> 32     Paranthropus        Paranthropus boisei        TRUE
#> 34     Paranthropus        Paranthropus boisei        TRUE
#> 35     Paranthropus      Paranthropus robustus        TRUE
#> 37 Australopithecus Australopithecus afarensis        TRUE
#> 38     Paranthropus      Paranthropus robustus        TRUE
#> 42 Australopithecus Australopithecus afarensis        TRUE
#> 46     Paranthropus      Paranthropus robustus        TRUE
#> 47 Australopithecus Australopithecus afarensis        TRUE
df %>% 
  mutate(right_genus = str_detect(`Prediction (Sp)`, `Prediction (Ge)`)) %>% 
  group_by(right_genus) %>% 
  tally()
#> # A tibble: 2 × 2
#>   right_genus     n
#>   <lgl>       <int>
#> 1 FALSE           5
#> 2 TRUE           22

reprex package(v2.0.1)于2023年2月1日创建

pcww981p

pcww981p3#

您可以使用gsub()table()

> df$a <- df$`Prediction (Ge)`
> df$b <- gsub(' .+$', '', df$`Prediction (Sp)`)
> table(df$a == df$b)

FALSE  TRUE 
    5    22

如果愿意,可以添加一列。

> df$match <- df$a == df$b
> head(df)
    Prediction (Ge)            Prediction (Sp)                a
2      Paranthropus Australopithecus africanus     Paranthropus
3      Paranthropus      Paranthropus robustus     Paranthropus
6              Homo        Paranthropus boisei             Homo
7      Paranthropus      Paranthropus robustus     Paranthropus
9  Australopithecus      Paranthropus robustus Australopithecus
10     Paranthropus      Paranthropus robustus     Paranthropus
                  b match
2  Australopithecus FALSE
3      Paranthropus  TRUE
6      Paranthropus FALSE
7      Paranthropus  TRUE
9      Paranthropus FALSE
10     Paranthropus  TRUE

相关问题