R中无for循环tibble的行变异

aamkag61  于 2023-01-18  发布在  其他
关注(0)|答案(2)|浏览(143)

我有两个tibble,需要索引一个tibble中的数据,并根据第一个tibble中的变量在另一个tibble中插入一些特定的数据。
我有两条建议:

library(dplyr)

# Set seed
set.seed(10)

# Get data
df1 <-  starwars %>% 
  select(name,species) %>%
  filter(species %in% c("Human","Wookiee","Droid")) %>%
  mutate(Fav_colour = sample(c("blue","red","green"),n(),replace=TRUE))

# Random table with typical colour preference
df2 <- tibble(Colour = c("blue","green","red"),
                   Human = c(0.5,0.3,0.1),
                   Wookiee = c(0.2,0.8,0.5),
                   Droid = c(0.3,0.1,0.5))

在df1中,我需要插入基于物种的典型颜色偏好,为此,我可以在for循环中遍历tibble的每一行,添加相关数据,然后编译成一个列表。

# Make empty list
data <- list()

# Iterate through each row
for (x in 1:nrow(df1)) {
  # Take a slice
  tmp <- slice(df1, x)
  
  # Add new column to slice based on data in slice (species)
  tmp$Typical <- df2 %>%
    select(Colour,tmp$species) %>%
    arrange(desc(.data[[tmp$species]])) %>% 
    slice_head(n = 1) %>% 
    select(Colour) %>% 
    as.character()
    
  #Add data to list
  data[[x]] <- tmp
}

#Recompile df1
df1 <- list.rbind(data)

我认为一定有更有效的方法来实现这一点,但是我不知道如何在不使用for循环的情况下从df2中获得经过过滤和排列的值。我不知道如何做到这一点,但是使用带函数的sapply可能是更好的选择吗?不使用for循环的dplyr方法是什么?

lf5gs5x2

lf5gs5x21#

听起来你想从df2中得到每个物种的最大值。如果我们pivot_longer使物种在一列中指定,而值在另一列中指定,我们可以按物种分组并保留最大值。这个查找表(颜色+每个物种的值)可以加入到原始数据中。

df1 %>%
  left_join(
    df2 %>% 
      tidyr::pivot_longer(2:4, names_to = "species") %>%
      group_by(species) %>%
      slice_max(value)
  )

结果

Joining with `by = join_by(species)`
# A tibble: 43 × 5
   name               species Fav_colour Colour value
   <chr>              <chr>   <chr>      <chr>  <dbl>
 1 Luke Skywalker     Human   green      blue     0.5
 2 C-3PO              Droid   blue       red      0.5
 3 R2-D2              Droid   red        red      0.5
 4 Darth Vader        Human   green      blue     0.5
 5 Leia Organa        Human   red        blue     0.5
 6 Owen Lars          Human   green      blue     0.5
 7 Beru Whitesun lars Human   green      blue     0.5
 8 R5-D4              Droid   green      red      0.5
 9 Biggs Darklighter  Human   green      blue     0.5
10 Obi-Wan Kenobi     Human   green      blue     0.5
# … with 33 more rows
# ℹ Use `print(n = ...)` to see more rows
erhoui1w

erhoui1w2#

请检查不使用循环的替代方法,检查df4 Dataframe

df3 <- df2 %>% 
pivot_longer(c('Human','Wookiee','Droid'),names_to = 'species', values_to = 'score') %>% 
arrange(species, desc(score)) %>% 
  group_by(species) %>% slice(1)

df4 <- df1 %>% left_join(df3, by='species') %>% rename(Typical = Colour) %>% select(-score)

相关问题