删除 Dataframe 列表中的部分行名称

dluptydi 于 2022-12-25 发布在其他

关注(0)|答案(2)|浏览(125)

我有两个 Dataframe 列表，其中一个 Dataframe 列表的结构如下：

data1 

Label                            Pred   n
1 Mito-0001_Series007_blue.tif   Pear  10
2 Mito-0001_Series007_blue.tif Orange 223
3 Mito-0001_Series007_blue.tif  Apple 890
4 Mito-0001_Series007_blue.tif  Peach  34

并以不同的数字重复，例如

Label                            Pred   n
1 Mito-0002_Series007_blue.tif   Pear  90
2 Mito-0002_Series007_blue.tif Orange  127
3 Mito-0002_Series007_blue.tif  Apple  76
4 Mito-0002_Series007_blue.tif  Peach  344

第二个 Dataframe 列表的结构如下：

data2

Slice                                       Area
Mask of Mask-0001Series007_blue-1.tif.      789.21

等等

问题

我想
1.通过以下方式使行名称匹配：
a）从数据1中删除“Mito-”
B）从数据2中删除“掩码的掩码-”
c）删除数据2末尾的“-1”
请记住，这是一个 Dataframe 列表。

目前为止：

我已经使用了来自名为“如何删除数据框中行名称的某些部分”的帖子的信息
How can I remove certain part of row names in data frame
他们建议使用

data2$Slice <- sub("Mask of Mask-", "", data2$Slice)

这显然不适用于 Dataframe 列表，它返回一个空字符

character(0)

提前感谢，我一直惊讶于人们在这个网站上回答问题是多么的棒：）

来源：https://stackoverflow.com/questions/74885396/remove-a-part-of-a-row-name-in-a-list-of-dataframes

2条答案

按热度按时间

8ehkhllq1#

首先，我们可以定义一个函数f，该函数将gsub应用于一个适合所有情况的正则表达式。

f <- \(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)

说明：*
.*任意单个字符，重复
\\d{4}四位数
_?下划线（如果可用）
Series字面上
(...)捕获组（它们在内部编号）
\\.一个句点（需要转义，否则我们说"任意字符"）
\\1捕获组1
一个月一次 *

## test it
(x <- c(names(data1), data1[[1]]$Label, data2$Slice))
# [1] "Mito-0001_Series007_blue"               "Mito-0002_Series007_blue"              
# [3] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [5] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [7] "Mask of Mask-0001Series007_blue-1.tif."

f(x)
# [1] "0001_Series007_blue" "0002_Series007_blue" "0001_Series007_blue" "0001_Series007_blue"
# [5] "0001_Series007_blue" "0001_Series007_blue" "0001Series007_blue"

似乎有用，所以我们可以应用它。

names(data1) <- f(names(data1))
data1 <- lapply(data1, \(x) {x$Label <- f(x$Label); x})
data2$Slice <- f(data2$Slice)

data1
# $`0001_Series007_blue`
# Label   Pred   n
# 1 0001_Series007_blue   Pear  10
# 2 0001_Series007_blue Orange 223
# 3 0001_Series007_blue  Apple 890
# 4 0001_Series007_blue  Peach  34
# 
# $`0002_Series007_blue`
# Label   Pred   n
# 1 0002_Series007_blue   Pear  90
# 2 0002_Series007_blue Orange 127
# 3 0002_Series007_blue  Apple  76
# 4 0002_Series007_blue  Peach 344

data2
#                Slice   Area
# 1 0001Series007_blue 789.21

数据：*

data1 <- list(`Mito-0001_Series007_blue` = structure(list(Label = c("Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif", "Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif"), Pred = c("Pear", "Orange", "Apple", 
"Peach"), n = c(10L, 223L, 890L, 34L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")), `Mito-0002_Series007_blue` = structure(list(
    Label = c("Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif", 
    "Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif"
    ), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(90L, 
    127L, 76L, 344L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")))

data2 <- structure(list(Slice = "Mask of Mask-0001Series007_blue-1.tif.", 
    Area = 789.21), class = "data.frame", row.names = c(NA, -1L
))

赞(0）回复(0）举报 2022-12-25

erhoui1w2#

使用给定信息

@jay.sf给出的答案非常有用。但它只适用于data 1，而不是data 2。为了确保它也适用于data 2，我额外添加了一行代码：

#Old code
f <-function(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)

#I added the [[1]] after data2 as well
(x <- c(names(data1), data1[[1]]$Label, data2[[1]]$Slice))
f(x)

names(data1) <- f(names(data1))
data1 <- lapply(data1, function(x) {x$Label <- f(x$Label); x})

# This line of code was causing problems, so I removed it
# data2$Slice <- f(data2$Slice)

#And added the following to apply it to data 2

names(data2) <- f(names(data2))
data2 <- lapply(data2, function(x) {x$Slice <- f(x$Slice); x})

赞(0）回复(0）举报 2022-12-25

我来回答

删除 Dataframe 列表中的部分行名称

问题

目前为止：

2条答案

使用给定信息

相关问题

热门标签

最新问答