我想把文本文件写入hdfs。文件必须写入hdfs的路径是动态生成的。如果文件路径(包括文件名)是新的,则应创建该文件并向其写入文本。如果文件路径(包括文件)已存在,则必须将字符串附加到现有文件。
我使用了以下代码。文件创建工作正常。但无法将文本附加到现有文件。
def writeJson(uri: String, Json: JValue, time: Time): Unit = {
val path = new Path(generateFilePath(Json, time))
val conf = new Configuration()
conf.set("fs.defaultFS", uri)
conf.set("dfs.replication", "1")
conf.set("dfs.support.append", "true")
conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")
val Message = compact(render(Json))+"\n"
try{
val fileSystem = FileSystem.get(conf)
if(fileSystem.exists(path).equals(true)){
println("File exists.")
val outputStream = fileSystem.append(path)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Appended to file in path : " + path)
}
else {
println("File does not exist.")
val outputStream = fileSystem.create(path, true)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Created file in path : " + path)
}
}catch{
case e:Exception=>
e.printStackTrace()
}
}
hadoop版本:2.7.0
每当必须执行append时,都会生成以下错误:
org.apache.hadoop.ipc.remoteexception(java.lang.arrayindexoutofboundsexception)
1条答案
按热度按时间6jjcrrmo1#
我可以看到3种可能性:
可能最简单的方法是使用
hdfs
它位于hadoop集群上,请参见:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfscommands.html . 甚至webhdfs rest功能:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/webhdfs.html如果您不想使用hdfs commnads,那么可以使用
hadoop-hdfs
图书馆http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1如果你想要清洁的scala溶液,使用spark。http://spark.apache.org/docs/latest/programming-guide.html 或者https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html