pyspark Spark adls使用不同的SPN从一个容器读取并写入另一个容器

zqdjd7g9 于 2023-10-15 发布在 Spark

关注(0)|答案(1)|浏览(88)

在pyspark中，我使用Azure服务原则（SPN）访问ADLS Gen 2。我正在使用spark conf设置SPN凭据。

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

我拥有的SPN是粒度的，并在容器级别而不是存储帐户级别上提供访问。这意味着对于容器abcd和wxyz（在同一个存储帐户中），我将有2个不同的SPN。
当我配置spark conf时，我可以在存储帐户级别上设置SPN。如何在容器级别设置SPN凭据？我的目标是以某种方式将SPN凭据添加到两个容器的spark conf中，并跨容器执行读/写操作。
我的用例是从一个容器中读取数据，然后写入同一存储帐户中的另一个容器。由于SPN是不同的，我只能在spark conf中同时设置1，所以我无法做到这一点。
如果你能帮忙的话，我将不胜感激。

pyspark

来源：https://stackoverflow.com/questions/77257324/spark-adls-read-from-one-container-and-write-to-another-using-different-spns

1条答案

按热度按时间

ojsjcaue1#

您可以在查询级别将这些访问详细信息传递给DataFrameReader相关选项：

def credsFor(tenantId: String, clientId: String, clientSecret: String) = Map(
  "fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net" -> "OAuth",
  "fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net" -> clientId,
  "fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net" -> clientSecret,
  "fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net" -> s"https://login.microsoftonline.com/$tenantId/oauth2/token"
)

val df1 = spark.read
  .format("delta")
  .options(credsFor("<TENANT>", "<APP1>", "<SECRET1>"))
  .load("<CONTAINER1>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<PATH1>")

val df2 = spark.read
  .format("delta")
  .options(credsFor("<TENANT>", "<APP2>", "<SECRET2>"))
  .load("<CONTAINER2>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<PATH2>")

赞(0）回复(0）举报 2023-10-15

我来回答

pyspark Spark adls使用不同的SPN从一个容器读取并写入另一个容器

1条答案

相关问题

热门标签

最新问答