R语言 替换多列值

niknxzdl  于 11个月前  发布在  其他
关注(0)|答案(5)|浏览(107)

我有一个数据表“data”,有25列。(约15),其中包含数值(但在导入后定义为字符),我想替换某些字符,例如“,”由“.",“<”由“",“>”由“”等。(可以是10个或更多的组合),因为有些值是这样的“<0,17”或“> 1,5”。
当列名改变时(因为它影响不同的数据表),我想用这种方式解决它(我编写的代码不正确,它只是为了显示我想做的事情)。

replace <- list ("," = ".", "<" = "", ">" = "")
affectedColumns = c("name1", "name2", "name3" ... "name 14", "name 15").

mydata %>%
  mutate(affectedColumns, replace)

字符串
另一个问题是,有些列是数字,有些是字符。首先将“affectedColumns”中的所有值转换为字符(as.character)>然后进行替换过程,然后将所有值转换回数字(as.numeric)是否有意义?

最后,我希望值以“.”作为逗号,没有任何“<”或“>”或空格。

有办法吗?谢谢!

nnsrf1az

nnsrf1az1#

这是一个基本的R方式。

mydata[affectedColumns] <- lapply(mydata[affectedColumns], \(x){
  for(nm in names(replace)) x <- sub(nm, replace[nm], x)
  as.numeric(x)
})

字符串

jgzswidk

jgzswidk2#

您可以使用readr包中的parse_number转换为数字,同时删除大于/小于符号。

library(readr)

df <- data.frame("name1" = c("1,5", "> 1,5", "<1,6"), 
                 "name2" = c("1,5", "1,5", "1,5"), 
                 "name3" = c("1,0", "1", "1"),
                 "name4" = c(1.5, 1, 0.5)
                 )

affectedColumns <- c("name1", "name2", "name3")

new_df <- mutate(df, across(affectedColumns, .fns = ~parse_number(.x, locale = locale(decimal_mark = ","))))

字符串

jfewjypa

jfewjypa3#

以下是dplyr解决方案:

library(dplyr)
mydata %>%
  # Step 1: remove < and >:
  mutate(across(c(everything()), 
                ~ sub("\\s?(>|<)", "", .))) %>%
  # Step 2: replace dot by comma:
  mutate(across(c(everything()), 
                ~ sub("\\.", ",", .))) 
  col1   col2
1  1,2 12,701
2    3  55,77
3    5   5000

字符串

编辑

下面是一个使用setNamesstringr的解决方案:
首先定义新值和旧值的集合(确保转义regex元字符,如.):

replacements <- setNames(c("", "", ","),     # new values
                         c("<", ">", "\\.")) # old values


或者,更经济地说:

replacements <- setNames(c("", ","),      # new values
                         c("<|>", "\\.")) # old values


现在使用str_replace_all一次性实现这些更改:

library(stringr)
mydata %>%
  mutate(across(c(col1:col2), 
                ~ str_replace_all(., replacements)))


玩具数据:

mydata <- data.frame(
  col1 = c("1.2", "3", "<5"),
  col2 = c(">12.701", "55,77", "< 5000")
)
8wtpewkr

8wtpewkr4#

考虑mutateacrosscase_when函数的组合,形成dplyr包。您可以在这里找到它们:https://dplyr.tidyverse.org/reference/across.html和这里:https://dplyr.tidyverse.org/reference/case_when.html或给予一个最小的可重复的例子。
最好的,M。

mcdcgff0

mcdcgff05#

structure(list(D = c(12327, 12328, 12329, 12330, 12331, 12333, 
12334, 12335, 12336, 12337, 12338, 12339, 12340, 12343, 12345, 
12348, 12349, 12350, 12351, 12352), E = c(12310, 12310, 12326, 
12326, 12315, 12326, 0, 12324, 12324, 12334, 12334, 0, 12339, 
0, 0, 12345, 12345, 0, 12343, 12343), Basiswert = c("AUDCAD", 
"AUDCAD", "USDJPY", "USDJPY", "USDCAD", "USDJPY", "USDCHF", "USDCHF", 
"USDCHF", "USDCHF", "USDCHF", "USDCAD", NA, "USDCAD", "CADJPY", 
"CADJPY", "CADJPY", "USDCHF", "USDCAD", "USDCAD"), Einstieg = c(NA, 
0.89262, NA, 139.192, NA, NA, 0.9052, NA, 0.90834, NA, 0.90816, 
NA, NA, 1.362, 103.188, NA, 102.886, 0.9051, NA, 1.36504), Profit = c(33, 
NA, 34, NA, 68, 68, NA, 33, NA, 33, NA, NA, NA, NA, NA, 34, NA, 
NA, 33, NA), SL = c(NA, NA, NA, NA, NA, NA, 0.91134, NA, NA, 
NA, NA, NA, NA, 1.3684, 102.545, NA, NA, 0.91138, NA, NA), TP = c(NA, 
NA, NA, NA, NA, NA, 0.89325, NA, NA, NA, NA, NA, NA, 1.3504, 
104.35, NA, NA, 0.8933, NA, NA), Trader = c(NA, NA, NA, NA, NA, 
NA, "Trade by Jason\" ", NA, NA, NA, NA, NA, NA, "Trade by Jason\" ", 
"Trade by Jason\" ", NA, NA, "Trade by Jason\" ", NA, NA)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    E = c(0, 12310, 12315, 12324, 12326, 12334, 12339, 12343, 
    12345), .rows = structure(list(c(7L, 12L, 14L, 15L, 18L), 
        1:2, 5L, 8:9, c(3L, 4L, 6L), 10:11, 13L, 19:20, 16:17), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))

字符串
非常感谢您的努力和解决方案。然而,我没有对整个数据集进行工作。请参阅上面的示例。

相关问题