csv nchar(Tony.raw$neighborhood_overview)中的错误:'nchar()'需要字符向量

polhcujo  于 2023-09-28  发布在  其他
关注(0)|答案(1)|浏览(79)

我遇到以下错误消息:

Error in nchar(Tony.raw$neighborhood_overview) : 
  'nchar()' requires a character vector

我不知道为什么nchar不能读入neighborhood_overview列。我有一个任务,提供CSV文件,从调查问卷中获得有关丹佛社区社会统计数据。我需要计算某些数据列的字符长度,然后将它们绘制成图表,以表示数据中可用的某些透视图。我将在不同的数据列上尝试相同的代码,看看我得到了什么。
指向.csv数据的链接:
https://drive.google.com/open?id=1mGsy52nZtRNpAFEWiWaJHB2nsm2hnvsU

#Load up the .CSV data and explore in RStudio
Tony.raw <- read.csv("denver_listings.csv", stringsAsFactors = FALSE)
View(Tony.raw)

# Clean up the data frame and view our handiwork.
Tony.raw <- Tony.raw[, c("description", "neighborhood_overview")]
View(Tony.raw)

# Check data to see if there are missing values.
length(which(!complete.cases(Tony.raw)))

#Convert our class label into a factor.
Tony.raw$neighborhood_overview <- 
as.factor(which(complete.cases(Tony.raw$neighborhood_overview)))

# The first step , as always, is to expore the data.
#First, let's take a look at distribution of the class labels (i.e., ham 
vs. spam),
prop.table(table(Tony.raw$neighborhood_overview))

#Next up , let's get a feel for the distribution of text lengths of the 
SMS
# messages by adding a new dearture for the length of each message.
Tony.raw$TextLength <- nchar(Tony.raw$neighborhood_overview)
summary(Tony.raw$TextLength)

#Visualize distribution with ggplot2, adding segmentation for ham/spam
library(ggplot2)

ggplot(Tony.raw, aes(x=TextLength, fill = neighborhood_overview)) +
  theme_bw() +
  geom_histogram(binwidth = 5) +
  labs(y = "Text Count", x = "Length of Text",
       title = "Distribution of Text Lengths with class Labels")

将Tony.raw$TextLength设置为Tony.raw$neighborhood_overview的nchar,我应该能够计算字符数,并使用ggplot 2将其绘制到图表中。但是它说nchar需要一个字符向量。是因为描述数据不是字符还是列标签不是字符?我不知道

mpbci0fu

mpbci0fu1#

在代码的第四块中,您已经将Tony.raw$neighborhood_overview转换为factor。你需要
nchar(labels(Tony.raw$neighborhood_overview)[Tony.raw$neighborhood_overview])
而不是nchar(Tony.raw$neighborhood_overview),以获得因子标签的nchar
当你写nchar(Tony.raw$neighborhood_overview)时,它会在因子的 levels 上调用nchar,这是从1到级数的整数值,并抛出一个错误,因为nchar得到的是数字而不是字符串。

相关问题