什么是正确的方法来确定文件夹是否存在于adls gen 2帐户上

vawmfj5a  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(396)

我在scala和spark环境中工作,我想阅读Parquet文件。在阅读之前,我想检查文件是否存在。我在jupyter notebook中编写了以下代码,但它不起作用-这意味着它不显示任何帧,因为函数testdirexist返回false

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val hadoopfs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration)

def testDirExist(path: String): Boolean = {
  val p = new Path(path)
  hadoopfs.exists(p) && hadoopfs.getFileStatus(p).isDirectory
}
val pt = "abfss://container@account.dfs.core.windows.net/blah/blah/blah

val exists = testDirExist(pt)
if(exists)
{
val dataframe = spark.read.parquet(pt)
    dataframe.show()
}

但是,下面的代码可以工作。它显示Dataframe

val k = spark.read.parquet("abfss://container@account.dfs.core.windows.net/blah/blah/blah)
k.show()

有人能帮我检查一下文件是否存在吗?
谢谢

cetgtptt

cetgtptt1#

您只需将默认文件系统设置为存储帐户:

import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.FileSystem
    import org.apache.hadoop.fs.Path
    import java.io.PrintWriter

    val conf = new Configuration()
    conf.set("fs.defaultFS", "abfss://<container_name>@<account_name>.dfs.core.windows.net")
    conf.set("fs.azure.account.auth.type.<container_name>.dfs.core.windows.net", "OAuth")
    conf.set("fs.azure.account.oauth.provider.type.<container_name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    conf.set("fs.azure.account.oauth2.client.id.<container_name>.dfs.core.windows.net", "<client_id>")
    conf.set("fs.azure.account.oauth2.client.secret.<container_name>.dfs.core.windows.net", "<secret>")
    conf.set("fs.azure.account.oauth2.client.endpoint.<container_name>.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")

    val fs= FileSystem.get(conf)
    val ostream = fs.create(new Path("/abfss_test.out"))
    val pwriter = new PrintWriter(ostream)
    try {
      pwriter.write("Azure Datalake Gen2 test")
      pwriter.write("\n")
    }
    finally {
      pwriter.close()
    }
//  check if the file we've just created exists
    println(fs.exists(new Path("/abfss_test.out")))

相关问题