如何获得结束目录的绝对路径？

kyxcudwk 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(472)

我在hdfs中有如下目录结构， /data/current/population/{p_1,p_2} /data/current/sport /data/current/weather/{w_1,w_2,w_3} /data/current/industry 文件夹 population, sport, weather & industry 每个数据集对应不同的数据集。例如，结束文件夹 p_1 & p_2 ，适用于不同的数据源（如果可用）。
我在写pyspark代码 A_1, A_2, B, C_1, C_2, C_3 & D 文件夹（结束文件夹）。给定一条类似 /data/current/ 对于您的代码，如何仅提取结束文件夹的绝对路径？
命令hdfs dfs-ls-r/data/current给出以下输出 /data/current /data/current/population /data/current/population/p_1 /data/current/population/p_2 /data/current/sport /data/current/weather /data/current/weather/w_1 /data/current/weather/w_2 /data/current/weather/w_3 /data/current/industry 但我想以结束文件夹的绝对路径结束。我的输出应该如下所示 /data/current/population/p_1 /data/current/population/p_2 /data/current/sport /data/current/weather/w_1 /data/current/weather/w_2 /data/current/weather/w_3 /data/current/industry -提前谢谢

hdfs apache-spark pyspark python-2.7 bash

来源：https://stackoverflow.com/questions/40223184/how-to-get-absolute-paths-of-end-directories

1条答案

按热度按时间

bnlyeluc1#

为什么不使用hdfs客户端编写一些代码，比如snakebite。
我附加scala函数来执行下面的操作。此函数获取根文件夹路径并给出所有结束路径的列表。您可以在python中使用snakebite执行相同的操作。

def traverse(path: Path, col: ListBuffer[String]): ListBuffer[String] = {
      val stats = fs.listStatus(path)
      for (stat <- stats) {
        if (stat.isFile()) {
          col += stat.getPath.toString()
        } else {
          val nl = fs.listStatus(stat.getPath)
          if (nl.isEmpty)
            col += stat.getPath.toString()
          else {
            for (n <- nl) {
              if (n.isFile) {
                col += n.getPath.toString()
              } else {
                col ++= traverse(n.getPath, new ListBuffer)
              }
            }
          }
        }
      }

      col
    }

赞(0）回复(0）举报 2021-05-27

我来回答

如何获得结束目录的绝对路径？

1条答案

相关问题

热门标签

最新问答