Paddle read data from hdfs

qyzbxkaa 于 5个月前发布在 HDFS

关注(0)|答案(1)|浏览(93)

"Different node should owns different parts of all Train data. This simple script did not do this job, so you should prepare it at last. " I saw this in cluster training wiki. So, could paddle read data from hdfs and distribute data to each node automatically?

Paddle

来源：https://github.com/PaddlePaddle/Paddle/issues/1

1条答案

按热度按时间

eoxn13cs1#

Distribute data to cluster is not added in PaddlePaddle now. You can read data directly from a HDFS file path by PyDataProvider2.

PaddlePaddle not handle how to get data file remotely, just pass the file path into a Python function. It is user's job to OPEN the file (or SQL connection string, or HDFS path), and get each
sample one by one from it.

It is welcome to contribute a script to distribute data to cluster. Or we may add it soon if this feature is very necessary.

赞(0）回复(0）举报 5个月前

我来回答

Paddle read data from hdfs

1条答案

相关问题

热门标签

最新问答