R语言创建一个随另一个变量的每次更改而递增的数字序列

ufj5ltwl 于 2023-03-20 发布在其他

关注(0)|答案(4)|浏览(167)

创建一个随组变量的每次变化而递增的数字序列的有效方法是什么？作为一个玩具示例，使用下面的数据框，我希望一个新变量“值”具有值c(1,1,1,2,2,3,3,4)。请注意，即使48重复自身，“值”仍然增加，因为我只关心序列中的变化。

df <- read.table(textConnection(
  'Group 
  48 
  48
  48
  56
  56
  48
  48
  14'), header = TRUE)

一种方法是

df$Value<-1
for(i in 2:nrow(df)){
if(df[i,]$Group==df[i-1,]$Group){df[i,]$Value=df[i-1,]$Value}
else{df[i,]$Value=df[i-1,]$Value+1}
}

但这是非常慢的。我的实际数据集有几百万个观测值。

**注：**我很难确定此问题的标题，如果您愿意，请更改标题。

来源：https://stackoverflow.com/questions/56740756/create-a-sequence-of-numbers-that-increments-for-every-change-in-another-variabl

4条答案

按热度按时间

5lhxktic1#

我们也可以黑进rle。

r <- rle(df$Group)
r$values <- seq_along(r$lengths)
inverse.rle(r)
# [1] 1 1 1 2 2 3 3 4

数据

df <- structure(list(Group = c(48L, 48L, 48L, 56L, 56L, 48L, 48L, 14L
)), class = "data.frame", row.names = c(NA, -8L))

赞(0）回复(0）举报 2023-03-20

pcww981p2#

灵感来自这篇文章：https://stackoverflow.com/a/44512144/3772141
就这么办吧：

library(dplyr)

df %>%
  mutate(Value = cumsum(Group != lag(Group) | row_number() == 1))

结果：

# Group Value
#    48     1
#    48     1
#    48     1
#    56     2
#    56     2
#    48     3
#    48     3
#    14     4

工作原理：
1.将Value与上一行的Value进行比较。如果它发生变化，则将其设置为TRUE，在此指示开始一个新值：Group != lag(Group)

lag函数返回的第一个元素是NA，但对于第一行，它应该始终是TRUE：| row_number() == 1
TRUE和FALSE可以表示为1和0，因此使用cumsum函数时，只要内部表达式返回TRUE，Group发生变化，Value就会递增。

赞(0）回复(0）举报 2023-03-20

pvabu6sv3#

不如

library(tidyverse)
df = data.frame(Group = c(48, 
                      48,
                      48,
                      56,
                      56,
                      48,
                      48,
                      14))

# Get unique values in group
unique_vals = unique(df$Group)

# create a sequence from 1 up until the length of the unique values vector
sequential_nums = 1:length(unique_vals)

# Create a new column looking up the current value in the unique_vals list
# and replacing it with the correct sequential number
df %>% 
  mutate(Value = sequential_nums[match(Group, unique_vals)])

# Group      Value 
# 1    48         1
# 2    48         1
# 3    48         1
# 4    56         2
# 5    56         2
# 6    48         1
# 7    48         1
# 8    14         3

赞(0）回复(0）举报 2023-03-20

zzzyeukh4#

如果你在tidyverse中，dplyr 1.1.0有一个函数consecutive_id()，它可以完全满足你的需求！tidyverse团队推荐它用于Zoom通话记录，在这种情况下，同一个说话人的连续几行应该被归为一个单独的想法：https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs/#consecutive_id

library(dplyr)
df <- read.table(textConnection(
  'Group 
  48 
  48
  48
  56
  56
  48
  48
  14'), header = TRUE)

df |> mutate(value = consecutive_id(Group))
#>   Group value
#> 1    48     1
#> 2    48     1
#> 3    48     1
#> 4    56     2
#> 5    56     2
#> 6    48     3
#> 7    48     3
#> 8    14     4

赞(0）回复(0）举报 2023-03-20

我来回答

R语言创建一个随另一个变量的每次更改而递增的数字序列

4条答案

相关问题

热门标签

最新问答

R语言 创建一个随另一个变量的每次更改而递增的数字序列

4条答案

相关问题

热门标签

最新问答

R语言创建一个随另一个变量的每次更改而递增的数字序列