Azure Premium存储中的数据块自动加载程序文件通知模式

rvpgvaaj  于 2022-12-30  发布在  其他
关注(0)|答案(1)|浏览(89)

I want to use Databricks AutoLoader to read a stream of files, the volume of the data is high so I want to use file notification mode (when I used directory listing mode the latency was bad), but it seems I need a "storage queues" which is unavailable in Azure Premium storage, when I tried to run the following code I got the error msg: UnknownHostException: .queue.core.windows.net

val manager = CloudFilesAzureResourceManager
  .newManager
  .option("cloudFiles.connectionString", "XXX")
  .option("cloudFiles.resourceGroup", "XXX")
  .option("cloudFiles.subscriptionId", "XXX")
  .option("cloudFiles.tenantId", "XXX")
  .option("cloudFiles.clientId", "XXX")
  .option("cloudFiles.clientSecret","XXX")
  .option("path", "abfss://XXX@ZZZ.dfs.core.windows.net/test") // required only for setUpNotificationServices
  .create()

// Set up a queue and a topic subscribed to the path provided in the manager.
manager.setUpNotificationServices("XXX")

https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/file-notification-mode#permissions-azure
是否有方法在Azure Premium存储中使用文件通知模式?

g2ieeal7

g2ieeal71#

使用自动加载器来扩展自动加载器以摄取数百万个文件。选项使用通知允许您选择目录列表模式来检测新文件。
请提供创建云资源所需的权限。如果将通知设置为true,请配置cloudFiles

cloudFiles ={
    "cloudFiles.subscriptionId" :"<subscription_Id>",
    "cloudFiles.connectionString" :"<connectionString_Storage_account>",
    "cloudFiles.format":"csv",
    "cloudFiles.tenantId":"<tenantId>",
    "cloudFiles.clientId":"<client_ID>",
    "cloudFiles.clientSecret":"<Client_Secret>",
    "cloudFiles.resourceGroup":"<Resource_group_name>",
    "cloudFiles.useNotifications":"yes"
}

有关使用数据块配置autoloader的详细信息,请参阅此link。它详细说明了在自动加载器上读取写入流式数据。

相关问题