我遇到以下错误消息:
Error in nchar(Tony.raw$neighborhood_overview) :
'nchar()' requires a character vector
我不知道为什么nchar
不能读入neighborhood_overview
列。我有一个任务,提供CSV文件,从调查问卷中获得有关丹佛社区社会统计数据。我需要计算某些数据列的字符长度,然后将它们绘制成图表,以表示数据中可用的某些透视图。我将在不同的数据列上尝试相同的代码,看看我得到了什么。
指向.csv数据的链接:
https://drive.google.com/open?id=1mGsy52nZtRNpAFEWiWaJHB2nsm2hnvsU
#Load up the .CSV data and explore in RStudio
Tony.raw <- read.csv("denver_listings.csv", stringsAsFactors = FALSE)
View(Tony.raw)
# Clean up the data frame and view our handiwork.
Tony.raw <- Tony.raw[, c("description", "neighborhood_overview")]
View(Tony.raw)
# Check data to see if there are missing values.
length(which(!complete.cases(Tony.raw)))
#Convert our class label into a factor.
Tony.raw$neighborhood_overview <-
as.factor(which(complete.cases(Tony.raw$neighborhood_overview)))
# The first step , as always, is to expore the data.
#First, let's take a look at distribution of the class labels (i.e., ham
vs. spam),
prop.table(table(Tony.raw$neighborhood_overview))
#Next up , let's get a feel for the distribution of text lengths of the
SMS
# messages by adding a new dearture for the length of each message.
Tony.raw$TextLength <- nchar(Tony.raw$neighborhood_overview)
summary(Tony.raw$TextLength)
#Visualize distribution with ggplot2, adding segmentation for ham/spam
library(ggplot2)
ggplot(Tony.raw, aes(x=TextLength, fill = neighborhood_overview)) +
theme_bw() +
geom_histogram(binwidth = 5) +
labs(y = "Text Count", x = "Length of Text",
title = "Distribution of Text Lengths with class Labels")
将Tony.raw$TextLength设置为Tony.raw$neighborhood_overview的nchar,我应该能够计算字符数,并使用ggplot 2将其绘制到图表中。但是它说nchar需要一个字符向量。是因为描述数据不是字符还是列标签不是字符?我不知道
1条答案
按热度按时间mpbci0fu1#
在代码的第四块中,您已经将
Tony.raw$neighborhood_overview
转换为factor
。你需要nchar(labels(Tony.raw$neighborhood_overview)[Tony.raw$neighborhood_overview])
而不是
nchar(Tony.raw$neighborhood_overview)
,以获得因子标签的nchar
。当你写
nchar(Tony.raw$neighborhood_overview)
时,它会在因子的 levels 上调用nchar
,这是从1到级数的整数值,并抛出一个错误,因为nchar得到的是数字而不是字符串。