我如何在R中获取重复的行并转换数据框以添加唯一标识符的列?

ogq8wdun  于 2023-04-03  发布在  其他
关注(0)|答案(3)|浏览(126)

我有一个非常大的 Dataframe 。在简单的形式下,它看起来像这样...

df <- data.frame(matrix(nrow = 20, ncol = 2))
df[1:10,1] <- c("HeaderStart","LevelName","Experiment","SessionTime",
              "Subject","Session","ImgPath","RandomSeed",
              "DisplayRefreshRate","Level")

df[11:20,1] <- c("HeaderStart","LevelName","Experiment","SessionTime",
               "Subject","Session","ImgPath","RandomSeed",
               "DisplayRefreshRate","Level")

df[1:10,2] <- seq(1,10,1)
df[11:20,2] <- seq(1,10,1)

           

                   X1 X2
1         HeaderStart  1
2           LevelName  2
3          Experiment  3
4         SessionTime  4
5             Subject  5
6             Session  6
7             ImgPath  7
8          RandomSeed  8
9  DisplayRefreshRate  9
10              Level 10
11        HeaderStart  1
12          LevelName  2
13         Experiment  3
14        SessionTime  4
15            Subject  5
16            Session  6
17            ImgPath  7
18         RandomSeed  8
19 DisplayRefreshRate  9
20              Level 10

我想把它改成这个...

df <- data.frame(matrix(nrow = 10, ncol = 3))

df[1:10,1] <- c("HeaderStart","LevelName","Experiment","SessionTime",
                "Subject","Session","ImgPath","RandomSeed",
                "DisplayRefreshRate","Level")

df[1:10,2] <- seq(1,10,1)
df[1:10,3] <- seq(1,10,1)

                   X1 X2 X3
1         HeaderStart  1  1
2           LevelName  2  2
3          Experiment  3  3
4         SessionTime  4  4
5             Subject  5  5
6             Session  6  6
7             ImgPath  7  7
8          RandomSeed  8  8
9  DisplayRefreshRate  9  9
10              Level 10 10

本质上,我将在一列中具有唯一的列名,并在新列中添加与行中关联的相应值或字符的列。
编辑:注意,这是一个简单的格式。我不能只是简单地把10行,并把他们扔到一个新的列,并删除最后一行。和顺序的名称总是在变化。

ruarlubt

ruarlubt1#

我们可以将dcastrowid结合使用:

library(data.table)
library(reshape2)

dcast(df, X1 ~ rowid(X1), value.var = "X2") |>
  rename_with(~ paste0("X", seq_along(.))) |>
  arrange(X2)

                   X1 X2 X3
1         HeaderStart  1  1
2           LevelName  2  2
3          Experiment  3  3
4         SessionTime  4  4
5             Subject  5  5
6             Session  6  6
7             ImgPath  7  7
8          RandomSeed  8  8
9  DisplayRefreshRate  9  9
10              Level 10 10
d8tt03nd

d8tt03nd2#

这里有一个选项-通过'X2'创建一个序列列,并使用pivot_wider将其整形为宽

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
 mutate(rn = rowid(X1, X2)+1) %>% 
 pivot_wider(names_from = rn, values_from = X2, names_prefix = "X")
  • 输出
# A tibble: 10 × 3
   X1                    X2    X3
   <chr>              <dbl> <dbl>
 1 HeaderStart            1     1
 2 LevelName              2     2
 3 Experiment             3     3
 4 SessionTime            4     4
 5 Subject                5     5
 6 Session                6     6
 7 ImgPath                7     7
 8 RandomSeed             8     8
 9 DisplayRefreshRate     9     9
10 Level                 10    10

或使用data.table

library(data.table)
dcast(setDT(df), X1 ~ paste0("X", rowid(X1, X2) + 1))[order(X2)]
  • 输出
X1 X2 X3
 1:        HeaderStart  1  1
 2:          LevelName  2  2
 3:         Experiment  3  3
 4:        SessionTime  4  4
 5:            Subject  5  5
 6:            Session  6  6
 7:            ImgPath  7  7
 8:         RandomSeed  8  8
 9: DisplayRefreshRate  9  9
10:              Level 10 10
yvt65v4c

yvt65v4c3#

使用您的示例数据,基本R解决方案可能是:

classes = table(df[, 1])
results = t(sapply(names(classes), function(x){
    df[which(df[, 1] == x), 2]
}))

但是,如果每次不总是重复相同的唯一类名组,则这将失败(如果是这种情况,则很难看到可靠地返回类似矩阵的结果的解决方案)。
如果每组类名都来自一个单独的输入文档,那么提出一个健壮的解决方案就容易得多。

相关问题