我有一个3列的数据集,我想创建两个数据集,在第一列和第二列中有共同的值,在第二列中有相应的值,另一个数据集有不常见的值,在第二列中有相应的值。
struct(list(col1 = c("F1:19911"、"F6:29731","F2:26353","F2:11604","F1:20748","F6:25287","F2:19148"、"F4:20479"",F3:19564",F4:29795 ",F1:32641",F1:23920 "," F2:37051 "、" F4:31963 "," F1:27075 "," F1:34085 "," F1:31602 "," F2:28123 "," F2:28512 "," F4:31963 "、" F6:19142 "," F6:21309 "," F2:11153 "," F2:20275 "," F2:31059"," F2:3199"、" F2:31759"," F2:18603"," F4:21551"," F1:14042"," F2:25183"," F2:15691"," F2:17735"," F3:22580"," F4:23956"," F3:29087"," F3:2604"," F1:18485”),col2 = c(99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99 99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99),col3 = c(“F1:45496",F6:49991",F1:42766英寸,F6:41131",F1:49777"," F1:48389"," F3:40668"," F1:51123"," F6:49282"," F6:38250"," F1:59546"," F6:38404"," F1:40600"," F6:25287"," F2:19148"," F1:31602"," F2:28123"," F1:19911"、" F6:45844"," F1:40519",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)),row. names = c(NA,-38L),class = c(" tbl_df"," tbl"," data. frame"))
我希望输出像这样:数据集1:
| 列1|col2|
| - -------------|- -------------|
| F6:11249|八十五|
| F6:17709|八九|
数据集2:
| 列1|col2|
| - -------------|- -------------|
| F5:14398|九十|
基本上,我想在col1和3中找到公共值,并在2中获得值。就像我们在Excel中使用条件格式一样,找到公共值并过滤col1以获得公共值和col2中对应值的唯一值。但是由于数据集非常大,Excel需要太长时间来过滤。
1条答案
按热度按时间ukqbszuj1#
编辑
我确实明白你想做什么。你可以:
1.列出第3列中的值
library(dplyr)
value_list=data$col3 %>% as.factor() %>% levels()
1.检查是否在col1中遇到这些值,以便找到常见值
duplicated_data= data %>% filter(col1 %in% value_list) %>% select(col1,col2)
1.通过将两个数据集相减来找到“唯一”值
unique_data= data %>% anti_join(duplicated_data,by="col1") %>% select(col1,col2)