R语言 将字符串和数据框转换为另一个数据框

fdx2calv  于 2023-11-14  发布在  其他
关注(0)|答案(2)|浏览(92)

我有一个单行 Dataframe ,看起来像这样:

Donor  Treatment Timepoint
  MK434_016   WT5 ST002_50uM       6hr

字符串
和一个字符串,看起来像这样:

[1] "AAACAAGCAAACAAGAATTCGGTT-1" "AAACAAGCAAACAATCATTCGGTT-1" "AAACAAGCAAACCTGAATTCGGTT-1" "AAACAAGCAAACTTGGATTCGGTT-1"
[5] "AAACAAGCAAAGACCCATTCGGTT-1" "AAACAAGCAAAGGTAAATTCGGTT-1"


我想合并这两个数据框,创建一个类似于下面的数据框:

Donor  Treatment Timepoint
AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
etc...


我试过用rbind()或paste()以几种不同的方式合并它们,但不知道如何获得我要找的完整 Dataframe 。

yacmzcpb

yacmzcpb1#

我将首先在不使用 * 行名称的情况下将它们连接在一起,因为有些工具荣誉它们,有些工具忽略它们,有些工具主动删除它们。

df2 <- cbind(df1[rep(1, length(strings)),], data.frame(barcode = strings))
df2
#             Donor  Treatment Timepoint                    barcode
# MK434_016     WT5 ST002_50uM       6hr AAACAAGCAAACAAGAATTCGGTT-1
# MK434_016.1   WT5 ST002_50uM       6hr AAACAAGCAAACAATCATTCGGTT-1
# MK434_016.2   WT5 ST002_50uM       6hr AAACAAGCAAACCTGAATTCGGTT-1
# MK434_016.3   WT5 ST002_50uM       6hr AAACAAGCAAACTTGGATTCGGTT-1
# MK434_016.4   WT5 ST002_50uM       6hr AAACAAGCAAAGACCCATTCGGTT-1
# MK434_016.5   WT5 ST002_50uM       6hr AAACAAGCAAAGGTAAATTCGGTT-1

字符串
从这里开始,如果你真的想从列中删除barcode信息,并将它们作为行名称,这很简单:

rownames(df2) <- df2$barcode
df2$barcode <- NULL
df2
#                            Donor  Treatment Timepoint
# AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr


快速dplyr版本:

library(dplyr)
df1[rep(1, length(strings)),] %>%
  `rownames<-`(NULL) %>%
  mutate(barcode = strings) %>%
  tibble::column_to_rownames("barcode")
#                            Donor  Treatment Timepoint
# AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr


数据

df1 <- structure(list(Donor = "WT5", Treatment = "ST002_50uM", Timepoint = "6hr"), class = "data.frame", row.names = "MK434_016")
strings <- c("AAACAAGCAAACAAGAATTCGGTT-1", "AAACAAGCAAACAATCATTCGGTT-1", "AAACAAGCAAACCTGAATTCGGTT-1", "AAACAAGCAAACTTGGATTCGGTT-1", "AAACAAGCAAAGACCCATTCGGTT-1", "AAACAAGCAAAGGTAAATTCGGTT-1")

jdg4fx2g

jdg4fx2g2#

使用数据作为@r2evans的答案

library(dplyr)

df1 %>% 
  reframe(barcode = strings, across(everything()))
#>                      barcode Donor  Treatment Timepoint
#> 1 AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
#> 2 AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
#> 3 AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
#> 4 AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
#> 5 AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
#> 6 AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr

字符串
创建于2023-10-27带有reprex v2.0.2

相关问题