直接在hdfs中生成文件

xa9qqrwz 于 2021-05-29 发布在 Hadoop

关注(0)|答案(4)|浏览(486)

有没有办法直接在hdfs上生成文件？我希望避免生成本地文件，然后通过hdfs命令行执行以下操作： hdfs dfs -put - "file_name.csv" 复制到hdfs。
或者有python库吗？

hadoop hdfs python

来源：https://stackoverflow.com/questions/34722459/generate-files-directly-in-hdfs

4条答案

按热度按时间

u1ehiz5o1#

当我使用hdfscli的write方法时是不是非常慢？有没有什么方法可以加快使用hdfscli的速度？

with client.write(conf.hdfs_location+'/'+ conf.filename, encoding='utf-8', buffersize=10000000) as f:
writer = csv.writer(f, delimiter=conf.separator)
for i in tqdm(10000000000):
    row = [column.get_value() for column in conf.columns]
    writer.writerow(row)

多谢了。

赞(0）回复(0）举报 2021-05-30

2w3rbyxf2#

使用python将本地文件写入hdfs的两种方法：
一种方法是使用hdfs python包：
代码段：

from hdfs import InsecureClient
hdfsclient = InsecureClient('http://localhost:50070', user='madhuc')
hdfspath="/user/madhuc/hdfswritedata/"
localpath="/home/madhuc/sample.csv"
hdfsclient.upload(hdfspath, localpath)

输出位置：'/user/madhuc/hdfswritedata/sample.csv'
另一种方法是使用管道的子进程python包
代码表：

from subprocess import PIPE, Popen    

# put file into hdfs

put = Popen(["hadoop", "fs", "-put", localpath, hdfspath], stdin=PIPE, bufsize=-1)
put.communicate()    
print("File Saved Successfully")

赞(0）回复(0）举报 2021-05-30

thigvfpy3#

hdfs dfs -put 不需要在本地创建文件。而且，不需要在hdfs上创建零字节文件( touchz )并附加到( appendToFile ). 您可以直接在hdfs上编写文件，如下所示：

hadoop fs -put - /user/myuser/testfile

按回车键。在命令提示下，输入要放入文件中的文本。完成后，说 Ctrl+D .

赞(0）回复(0）举报 2021-05-30

fwzugrvs4#

你试过hdfscli吗？
引用读写文件的段落：


# Loading a file in memory.

with client.read('features') as reader:
  features = reader.read()

# Directly deserializing a JSON object.

  with client.read('model.json', encoding='utf-8') as reader:
    from json import load
    model = load(reader)

赞(0）回复(0）举报 2021-05-29

我来回答

直接在hdfs中生成文件

4条答案

相关问题

热门标签

最新问答