在不使用t()的情况下,使用spread()、cast()或melt()转置 Dataframe

esyap4oy  于 2023-04-03  发布在  其他
关注(0)|答案(3)|浏览(178)

我需要在不使用t()的情况下转置一个 Dataframe ,因为我不想将其转换为矩阵。因此,我使用here的解:

mydata <- data.table(col0=c("row1","row2","row3"),
                     col1=c(11,21,31),
                     col2=c(12,22,32),
                     col3=c(13,23,33))

mydata
# col0 col1 col2 col3
# row1   11   12   13
# row2   21   22   23
# row3   31   32   33

dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33

我用同样的逻辑来处理我正在使用的数据:

x <- merge(as.data.frame(table(mtcars$mpg)), as.data.frame(round(prop.table(table(mtcars$mpg)),2)), by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ Var1)

它起作用了!但它给了我一个警告和一个“未来错误”:

Warning message in data.table::melt(x, id.vars = "Var1"): “The melt
generic in data.table has been passed a data.frame and will attempt to
redirect to the relevant reshape2 method; please note that reshape2 is
deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are
attached, e.g. melt.list, you can prepend the namespace like
reshape2::melt(x). In the next version, this warning will become an
error.” Warning message in data.table::dcast(data.table::melt(x,
id.vars = "Var1"), variable ~ : “The dcast generic in data.table has
been passed a data.frame and will attempt to redirect to the
reshape2::dcast; please note that reshape2 is deprecated, and this
redirection is now deprecated as well. Please do this redirection
yourself like reshape2::dcast(data.table::melt(x, id.vars = "Var1")).
In the next version, this warning will become an error.”

另外,我一直在尝试使用dplyr::spread()here解决方案来转置 Dataframe ,但它似乎比data.table包的解决方案复杂得多(当值列大于1时,就像在这种情况下)。我更习惯于dplyr()tidyverse(),但data.table解决方案更简单,只需忽略它。

其他信息。

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GGally_1.4.0       forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3       
 [5] purrr_0.3.2        readr_1.3.1        tidyr_1.0.0        tibble_2.1.3      
 [9] ggplot2_3.2.1.9000 tidyverse_1.2.1    bigrquery_1.2.0    httr_1.4.1        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7          jsonlite_1.6         splines_3.6.0       
 [4] modelr_0.1.4         Formula_1.2-3        assertthat_0.2.1    
 [7] getPass_0.2-2        latticeExtra_0.6-28  cellranger_1.1.0    
[10] pillar_1.4.2         backports_1.1.5      lattice_0.20-38     
[13] glue_1.3.1           uuid_0.1-2           digest_0.6.21       
[16] checkmate_1.9.4      RColorBrewer_1.1-2   rvest_0.3.4         
[19] colorspace_1.4-1     htmltools_0.4.0      Matrix_1.2-17       
[22] plyr_1.8.4           psych_1.8.12         pkgconfig_2.0.3     
[25] broom_0.5.2          haven_2.1.1          scales_1.0.0        
[28] htmlTable_1.13.2     generics_0.0.2       withr_2.1.2         
[31] repr_1.0.1.9000      skimr_1.0.7          nnet_7.3-12         
[34] cli_1.1.0            mnormt_1.5-5         survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         readxl_1.3.1        
[40] evaluate_0.14        fs_1.3.1             nlme_3.1-141        
[43] xml2_1.2.2           foreign_0.8-72       data.table_1.12.4   
[46] tools_3.6.0          hms_0.5.1            gargle_0.4.0        
[49] lifecycle_0.1.0      munsell_0.5.0        cluster_2.1.0       
[52] compiler_3.6.0       rlang_0.4.0          grid_3.6.0          
[55] pbdZMQ_0.3-3         IRkernel_1.0.2.9000  rstudioapi_0.10     
[58] htmlwidgets_1.5.1    base64enc_0.1-3      gtable_0.3.0        
[61] DBI_1.0.0            reshape_0.8.8        reshape2_1.4.3      
[64] R6_2.4.0             gridExtra_2.3        lubridate_1.7.4     
[67] knitr_1.25           bit_1.1-14           zeallot_0.1.0       
[70] Hmisc_4.2-0          stringi_1.4.3        parallel_3.6.0      
[73] IRdisplay_0.7.0.9000 Rcpp_1.0.2           vctrs_0.2.0         
[76] rpart_4.1-15         acepack_1.4.1        xfun_0.10           
[79] tidyselect_0.2.5
ccrfmcuu

ccrfmcuu1#

我需要在不使用t()的情况下转置 Dataframe ,因为我想避免将其转换为矩阵。
如果您唯一的要求是避免将数据框强制转换为矩阵,则可以使用data.table::transpose,它需要版本〉= 1.12.4

data.table::transpose(
  mydata, 
  keep.names = 'variable', 
  make.names = names(mydata)[1])

#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33
ua4mk5z4

ua4mk5z42#

您需要确保将data.table对象传递给data.table::meltdata.table::dcast

x<-merge(as.data.frame(table(mtcars$mpg)),
        as.data.frame(round(prop.table(table(mtcars$mpg)),2)), 
        by="Var1", all.x=TRUE)

data.table::dcast(data.table::melt(data.table::setDT(x), id.vars = "Var1"), 
                  variable ~ Var1)

警告:

您可以看到,通过使用data.table::setDT,解决了 “future error”

#> Warning in melt.data.table(data.table::setDT(x), id.vars = "Var1"):
#> 'measure.vars' [Freq.x, Freq.y] are not all of the same type. By order
#> of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS
#> in ?melt.data.table for more on coercion.

输出:

#>    variable 10.4 13.3 14.3 14.7   15 15.2 15.5 15.8 16.4 17.3 17.8 18.1
#> 1:   Freq.x 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00
#> 2:   Freq.y 0.06 0.03 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03
#>    18.7 19.2 19.7   21 21.4 21.5 22.8 24.4   26 27.3 30.4 32.4 33.9
#> 1: 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00
#> 2: 0.03 0.06 0.03 0.06 0.06 0.03 0.06 0.03 0.03 0.03 0.06 0.03 0.03

***P.S.***我无法在data.table_1.12.2中重现错误,必须更新到data.table_1.12.6

nle07wnf

nle07wnf3#

新的tidyr 1.0函数使这变得容易得多:

library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

mydata <- tibble(col0=c("row1","row2","row3"),
                 col1=c(11,21,31),
                 col2=c(12,22,32),
                 col3=c(13,23,33))

# First collect all the values in the one column
(new_data <- mydata %>% pivot_longer(col1:col3))
#> # A tibble: 9 x 3
#>   col0  name  value
#>   <chr> <chr> <dbl>
#> 1 row1  col1     11
#> 2 row1  col2     12
#> 3 row1  col3     13
#> 4 row2  col1     21
#> 5 row2  col2     22
#> 6 row2  col3     23
#> 7 row3  col1     31
#> 8 row3  col2     32
#> 9 row3  col3     33

# Col0 is what we want the new column names to come from, so:
(new_data %<>% pivot_wider(names_from = col0))
#> # A tibble: 3 x 4
#>   name   row1  row2  row3
#>   <chr> <dbl> <dbl> <dbl>
#> 1 col1     11    21    31
#> 2 col2     12    22    32
#> 3 col3     13    23    33

mtcars用例中:

library(tidyverse)

(x <- 
    mtcars %>% 
    group_by(mpg) %>% 
    summarize(Freq.x = n(), 
              Freq.y = Freq.x/nrow(.)) %>% 
    pivot_longer(-mpg) %>% 
    pivot_wider(names_from = mpg))
#> # A tibble: 2 x 26
#>   name  `10.4` `13.3` `14.3` `14.7`   `15` `15.2` `15.5` `15.8` `16.4`
#>   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 Freq~ 2      1      1      1      1      2      1      1      1     
#> 2 Freq~ 0.0625 0.0312 0.0312 0.0312 0.0312 0.0625 0.0312 0.0312 0.0312
#> # ... with 16 more variables: `17.3` <dbl>, `17.8` <dbl>, `18.1` <dbl>,
#> #   `18.7` <dbl>, `19.2` <dbl>, `19.7` <dbl>, `21` <dbl>, `21.4` <dbl>,
#> #   `21.5` <dbl>, `22.8` <dbl>, `24.4` <dbl>, `26` <dbl>, `27.3` <dbl>,
#> #   `30.4` <dbl>, `32.4` <dbl>, `33.9` <dbl>

我对data.table基本上一无所知,但这给了我没有红色。
现在,如果你的值不都是相同的类型,这仍然会给你带来问题-因为它仍然在某个时候将所有的值堆叠到一列中-所以我打算建议使用nest()方法。但后来我意识到......如果你想转置事物,并且行不都是相同的值类型,那么你最终会尝试将不同类型的值放入一列中,不是吗?所以一些同质化的转化是不可避免的。
创建于2019-10-22由reprex package(v0.3.0)

相关问题