我遵循了这个链接的字数计算程序。链接由“rhadoop-wordcountusingrmr”给出
我正在获取输出,但它不是可读格式。我希望在输出中有键值对。我该怎么做呢。我应该对代码做什么修改。请帮帮我。
这是输出
hadoop@hadoop-vm用法:~/apache/hadoop-1.2.1$bin/hadoop fs-cat/user/hadoop/out10/part*seq/org.apache.hadoop.typedbytes.typedbyteswritable/org.apache.hadoop.typedbytes.typedbyteswritable�9d×5x��&�\hadoop@hadoop-vm用法:~/apache/hadoop-1.2.1$
这是密码
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}
## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')
## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')
## Submit job
out <- wordcount(hdfs.data, hdfs.out)
## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')
head(results.df)
暂无答案!
目前还没有任何答案,快来回答吧!