R语言如何检查行值是否与相应的列值匹配

6yoyoihd 于 2023-04-03 发布在其他

关注(0)|答案(3)|浏览(166)

我有一个 Dataframe ，它是通过导入几个.csv文件并随后将它们合并在一起创建的。
我读到的每个数据框都在第8行有列标题，在前7行有一些描述性文本。
这就是出现重复行的原因-因为我不能使用第一个 Dataframe 中第8行的值，然后丢弃其余 Dataframe 中的前8行（或者我可以-我相信这是可能的）。
最后，我想做的是：

- Read first .csv into data frame.
- Take values of row 8 to be column names
- Delete the first 8 rows.
- Read all other .csv files in, remove the first 8 rows from each one, and merge them all into the same data frame.

我现在面临一个问题，其中一些行将包含与其对应的列名相同的值。
例如，合并后的数据框现在看起来如下所示：

--------------------------
| Name | Age | MonthBorn |
-------------------------
| Bob  | 23  | September |
| Steve| 45  | June      |
| Name | Age | MonthBorn | # Should be removed
| Sue  | 74  | January   |
| Name | Age | MonthBorn | # Should be removed
| Tracy| 31  | February  |
--------------------------

麻烦的是，合并后的数据框几乎有340，000行深，所以我不能手动检查所有内容。此外，我对每一行可能出现的位置有一个粗略的想法，但我不能确定，因为可能会有变化。
如何检查行/单元格的值是否与相应的列名匹配，或者如何设置上面概述的导入过程（项目符号）？

来源：https://stackoverflow.com/questions/45941932/how-to-check-if-row-value-matches-corresponding-column-value

3条答案

按热度按时间

8wtpewkr1#

我们可以使用dplyr和tidyr中的函数将所有列的内容合并在一起，然后过滤掉与合并列名相同的内容，dt2就是最终的输出。

# Create example data
dt <- read.table(text = "Name Age MonthBorn
Bob 23 September
Steve 45 June 
Bob 23 September
Name Age MonthBorn
Sue 74 January
Name Age MonthBorn
Tracy 31 February",
                 header = TRUE, stringsAsFactors = FALSE)

# Load package
library(dplyr)
library(tidyr)

# Process the data
dt2 <- dt %>%
  unite(ColName, everything(), sep = ", ", remove = FALSE) %>%
  filter(ColName != toString(colnames(dt))) %>%
  select(-ColName)

dt2
   Name Age MonthBorn
1   Bob  23 September
2 Steve  45      June
3   Bob  23 September
4   Sue  74   January
5 Tracy  31  February

赞(0）回复(0）举报 2023-04-03

pu3pd22g2#

您的数据

df <- structure(list(Name_ = c("Bob", "Steve", "Bob", "Name", "Sue", 
"Name", "Tracy"), `_Age_` = c("23", "45", "23", "Age", "74", 
"Age", "31"), `_MonthBorn` = c("September", "June", "September", 
"MonthBorn", "January", "MonthBorn", "February")), .Names = c("Name_", 
"_Age_", "_MonthBorn"), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"))

溶液

library(stringr)
df[!sapply(1:nrow(df), function(x) all(mapply(function(x,y) str_detect(x,y), colnames(df), df[x,]))),]

输出

Name_ _Age_ _MonthBorn
1:   Bob    23  September
2: Steve    45       June
3:   Bob    23  September
4:   Sue    74    January
5: Tracy    31   February

赞(0）回复(0）举报 2023-04-03

ie3xauqp3#

如果您的数据框大致如下所示：

Df <- Data.frame(Name, Age, MonthBorn)

然后，您可以使用ifelse语句来测试“MonthBorn”是否出现在一行中。

Df$MonthBornTest <- ifelse(Df$MonthBorn == “MonthBorn”, “True”, “False”}

然后，您应该能够执行此操作以删除包含True的行，从而有效地删除不再需要的行。

Df <- Df[!(Df$MonthBornTest == “True”), ]

赞(0）回复(0）举报 2023-04-03

我来回答

R语言如何检查行值是否与相应的列值匹配

3条答案

您的数据

溶液

输出

相关问题

热门标签

最新问答

R语言 如何检查行值是否与相应的列值匹配

3条答案

您的数据

溶液

输出

相关问题

热门标签

最新问答

R语言如何检查行值是否与相应的列值匹配