使用R dozr mutate创建一个使用预先声明的水平的因子列

bogh5gae  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(75)

我编写了R代码来生成一个定期报告,该报告需要对周数重新排序,以便我可以按最近10周进行过滤和排序。为了防止错误和最小化硬编码值,我更喜欢在脚本的顶部声明本周顺序,该脚本是其他几个脚本的来源。因此,我想定义一个有序的因子列表,然后使用它对周数列进行排序。下面的RepEx,但通常我会重新排序所有52周,以便最近的10周是最后/最大的,例如。new_levels <- factor(1:52, levels = c(29:52, 1:28), ordered=TRUE)
旁注:任何关于如何更好地处理抓住最近(不一定是最大的)10周时间的建议都是受欢迎的。我过去的挣扎是由于接近年底的滚动(51,52,1,2,3,...)。
范例:

new_levels <- factor(1:10, levels = c(8:10, 1:7), ordered=TRUE)

data <- tibble(Week = 1:10, ID = c("A","A","B","B","C","A","D","B","D","A"))

data <- data %>% mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% arrange(Week2)

有序因子(new_levels)看起来是正确的,但是arrange()和str()的行为表明我想要的排序没有发生:

> new_levels
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7
> data
# A tibble: 10 × 3
    Week ID    Week2
   <int> <chr> <ord>
 1     1 A     1    
 2     2 A     2    
 3     3 B     3    
 4     4 B     4    
 5     5 C     5    
 6     6 A     6    
 7     7 D     7    
 8     8 B     8    
 9     9 D     9    
10    10 A     10   
> str(data)
tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
 $ Week : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ ID   : chr [1:10] "A" "A" "B" "B" ...
 $ Week2: Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8 9 10

谢谢你,谢谢!

ssm49v7z

ssm49v7z1#

如果你仔细观察你的输出,你会发现你没有做你所期望的:

data %>% 
  mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% 
  pull(Week2)
#  [1] 1  2  3  4  5  6  7  8  9  10
# Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10

这表明arrange按预期工作。这个问题来自于你正在分配levels = new_levels的事实。new_levels的值是多少?

new_levels
#  [1] 1  2  3  4  5  6  7  8  9  10
# Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7

在这种情况下,它是1:10的序列。你想要的是将new_levelslevels 赋值给新变量的levels:

data %>% 
  mutate(Week2 = factor(Week, levels = levels(new_levels), ordered = TRUE)) %>% 
  arrange(Week2)
#     Week ID    Week2
#    <int> <chr> <ord>
#  1     8 B     8    
#  2     9 D     9    
#  3    10 A     10   
#  4     1 A     1    
#  5     2 A     2    
#  6     3 B     3    
#  7     4 B     4    
#  8     5 C     5    
#  9     6 A     6    
# 10     7 D     7
anauzrmj

anauzrmj2#

levels参数的顺序必须正确(并且定义new_levels的levels不会对vector本身进行重新排序)。还要注意的是,它被factor函数转换为一个字符向量,因此(1)没有必要将new_levels定义为一个因子,(2)为传递给levels参数的向量定义的任何因子水平都是不相关的,(3)可以将new_levels定义为一个数值向量。你可以简化你的代码如下:

data <- tibble(Week = 1:10, ID = c("A","A","B","B","C","A","D","B","D","A"))
data <- data %>%
  mutate(Week2 = factor(Week, levels = c(8:10, 1:7), ordered = TRUE)) %>%
  arrange(Week2)

相关问题