R语言 根据另一列的值替换值

igetnqfo  于 11个月前  发布在  其他
关注(0)|答案(5)|浏览(80)

我有这3列。在“Basiswert”中有一些NA值。如果在“Basiswert”中有NA值,我想用Basiswert形式的相同数字替换它,它出现在列“D”或“E”中。例如:在第2行中,在Basiswert中的值中有NA。这个值必须用第3行中的USDCAD = 12442替换。

structure(list(D = c("12449", "12448", "12447", "12446", "12442", 
"12441", "12440", "12439", "12438", "12437"), E = c("0", "12442", 
"12442", "12430", "0", "12430", "12436", "12436", "12430", "12430"
), Basiswert = c("EURJPY", NA, "USDCAD", "EURAUD", "USDCAD", 
"EURAUD", NA, "GBPJPY", NA, "EURAUD")), class = "data.frame", row.names = c(NA, 
-10L))

字符串
编辑:

library(dplyr)

data %>% mutate(Basiswert = ifelse(is.na(Basiswert) == FALSE & 
D == E, Basiswert, NA))


不起作用。

mhd8tkvw

mhd8tkvw1#

另一个基本的R解决方案:

data$Basiswert2 <- ave(data$Basiswert, data$E, FUN = function(z) replace(z, is.na(z), max(z, na.rm = TRUE)))
data
#        D     E Basiswert Basiswert2
# 1  12449     0    EURJPY     EURJPY
# 2  12448 12442      <NA>     USDCAD
# 3  12447 12442    USDCAD     USDCAD
# 4  12446 12430    EURAUD     EURAUD
# 5  12442     0    USDCAD     USDCAD
# 6  12441 12430    EURAUD     EURAUD
# 7  12440 12436      <NA>     GBPJPY
# 8  12439 12436    GBPJPY     GBPJPY
# 9  12438 12430      <NA>     EURAUD
# 10 12437 12430    EURAUD     EURAUD

字符串
既然你说你想在DE之间分组,我建议这可以是一个链:

data |>
  transform(Basiswert = ave(Basiswert, D, FUN = function(z) ifelse(is.na(z), na.omit(z)[1], z))) |>
  transform(Basiswert = ave(Basiswert, E, FUN = function(z) ifelse(is.na(z), na.omit(z)[1], z)))
#        D     E Basiswert
# 1  12449     0    EURJPY
# 2  12448 12442    USDCAD
# 3  12447 12442    USDCAD
# 4  12446 12430    EURAUD
# 5  12442     0    USDCAD
# 6  12441 12430    EURAUD
# 7  12440 12436    GBPJPY
# 8  12439 12436    GBPJPY
# 9  12438 12430    EURAUD
# 10 12437 12430    EURAUD


ifelse(is.na(z), z, ..)replace(z, is.na(z), ..)在这里实际上是相同的.我回避ifelse的唯一原因是,如果你的数据不是int/num/chr.例如,如果你在日期或时间戳上使用这个逻辑,那么ifelseshould not be used
使用max(..)来确定替换值对于字符串来说是一个小技巧;它在某些情况下会发出警告(在这里对我来说就是这样),而使用na.omit(z)[1]将 * 总是 * 返回第一个非NA值,或者如果不存在NA值,则返回NA本身。
注意,这是一种估算形式,在寻找类似方法时,这是一个很好的搜索术语。在这种情况下,我们通过找到第一个非NA值来估算缺失值,这是一种方法;对于字符串,我认为其他流行的估算值的方法包括:(a)随机;(b)最频繁;(c)最频繁;(d)最频繁;(e)最频繁。以及(c)最不频繁,当每组预期有多个值时,可能用于平衡。数值插补可以带来许多其他选项,包括均值、众数、中位数、最小值/最大值等。

2ekbmq32

2ekbmq322#

不太清楚这个问题。看看这个是否有效。

data %>%
  group_by(E) %>%
  mutate(Basiswert2 = ifelse(is.na(Basiswert), max(Basiswert, na.rm = T), Basiswert))

字符串

pxiryf3j

pxiryf3j3#

基础R解决方案(在样本数据中,只有E的值重复,因此我们可以忽略D):

# Split-apply-combine by column E values: 
# res_df => data.frame
res_df <- data.frame(
  do.call(
    rbind,
    # For each E value: 
    lapply(
      with(df, split(df, E)),
      function(x){
        # If the first value in the vector is na:
        if(is.na(x$Basiswert[1])){
          # Resolve the first non na value: ir => vector length 1
          ir <- x$Basiswert[min(which(!(is.na(x$Basiswert))))]
          # Fill the first na value with first non-na value: 
          x$Basiswert[1] <- ir
        # Otherwise: 
        }else{
          # Do nothing: 
          invisible()
        } 
        # Fill the values down: x$Basiswert => vector
        x$Basiswert <- na.omit(x$Basiswert)[cumsum(!(is.na(x$Basiswert)))]
        # Return the data.frame: data.frame => env
        x
      }
    )
  ),
  row.names = NULL
)

字符串
考虑D的Tidyverse解决方案,如果D也需要考虑:

library(dplyr)
library(tidyr)
df %>%
  group_by(E) %>%
  fill(Basiswert, .direction = "downup") %>% 
  group_by(D) %>% 
  fill(Basiswert, .direction = "downup")


输入数据:

df <- structure(
  list(
    D = c("12449", "12448", "12447", "12446", "12442", 
    "12441", "12440", "12439", "12438", "12437"), 
    E = c("0", "12442", 
    "12442", "12430", "0", "12430", "12436", "12436", "12430", "12430"
    ), 
    Basiswert = c("EURJPY", NA, "USDCAD", "EURAUD", "USDCAD", "EURAUD", NA, "GBPJPY", NA, "EURAUD")
  ), 
  class = "data.frame", 
  row.names = c(NA, -10L)
)

4dbbbstv

4dbbbstv4#

使用@Hann Shaw的相同逻辑,我们也可以使用replace

library(dplyr)

df %>%
  mutate(Basiswert2 = replace(Basiswert, is.na(Basiswert), max(Basiswert, na.rm = TRUE)), .by=E)

     D     E Basiswert Basiswert2
1  12449     0    EURJPY     EURJPY
2  12448 12442      <NA>     USDCAD
3  12447 12442    USDCAD     USDCAD
4  12446 12430    EURAUD     EURAUD
5  12442     0    USDCAD     USDCAD
6  12441 12430    EURAUD     EURAUD
7  12440 12436      <NA>     GBPJPY
8  12439 12436    GBPJPY     GBPJPY
9  12438 12430      <NA>     EURAUD
10 12437 12430    EURAUD     EURAUD

字符串

bxfogqkk

bxfogqkk5#

structure(list(D = c(12327, 12328, 12329, 12330, 12331, 12333, 
12334, 12335, 12336, 12337, 12338, 12339, 12340, 12343, 12345, 
12348, 12349, 12350, 12351, 12352), E = c(12310, 12310, 12326, 
12326, 12315, 12326, 0, 12324, 12324, 12334, 12334, 0, 12339, 
0, 0, 12345, 12345, 0, 12343, 12343), Basiswert = c("AUDCAD", 
"AUDCAD", "USDJPY", "USDJPY", "USDCAD", "USDJPY", "USDCHF", "USDCHF", 
"USDCHF", "USDCHF", "USDCHF", "USDCAD", NA, "USDCAD", "CADJPY", 
"CADJPY", "CADJPY", "USDCHF", "USDCAD", "USDCAD"), Einstieg = c(NA, 
0.89262, NA, 139.192, NA, NA, 0.9052, NA, 0.90834, NA, 0.90816, 
NA, NA, 1.362, 103.188, NA, 102.886, 0.9051, NA, 1.36504), Profit = c(33, 
NA, 34, NA, 68, 68, NA, 33, NA, 33, NA, NA, NA, NA, NA, 34, NA, 
NA, 33, NA), SL = c(NA, NA, NA, NA, NA, NA, 0.91134, NA, NA, 
NA, NA, NA, NA, 1.3684, 102.545, NA, NA, 0.91138, NA, NA), TP = c(NA, 
NA, NA, NA, NA, NA, 0.89325, NA, NA, NA, NA, NA, NA, 1.3504, 
104.35, NA, NA, 0.8933, NA, NA), Trader = c(NA, NA, NA, NA, NA, 
NA, "Trade by Jason\" ", NA, NA, NA, NA, NA, NA, "Trade by Jason\" ", 
"Trade by Jason\" ", NA, NA, "Trade by Jason\" ", NA, NA)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    E = c(0, 12310, 12315, 12324, 12326, 12334, 12339, 12343, 
    12345), .rows = structure(list(c(7L, 12L, 14L, 15L, 18L), 
        1:2, 5L, 8:9, c(3L, 4L, 6L), 10:11, 13L, 19:20, 16:17), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))

字符串
Manny感谢您的努力和解决方案。然而,遗憾的是,它对整个数据集不起作用。

相关问题