我有个要求我必须通过ADF管道将文件(zip格式)从HDFS(Hadoop文件系统)复制到ADLS Gen2 Blob存储。HDFS系统中的文件格式如下:
HDFS Source:
hdfs/data/users/synova/raw/partition1/customer/full/2023-04-01/customer.zip
hdfs/data/users/synova/raw/partition1/customer/full/2023-04-02/customer.zip
hdfs/data/users/synova/raw/partition1/customer/full/2023-04-03/customer.zip
hdfs/data/users/synova/raw/partition2/parts/full/2023-04-01/parts.zip
hdfs/data/users/synova/raw/partition2/parts/full/2023-04-02/parts.zip
hdfs/data/users/synova/raw/partition2/parts/full/2023-04-03/parts.zip
hdfs/data/users/synova/raw/partition3/modules/full/2023-04-01/modules.zip
hdfs/data/users/synova/raw/partition3/modules/full/2023-04-02/modules.zip
hdfs/data/users/synova/raw/partition3/modules/full/2023-04-03/modules.zip
hdfs/data/users/synova/raw/partition4/events/full/2023-04-01/events.zip
hdfs/data/users/synova/raw/partition4/events/full/2023-04-02/events.zip
hdfs/data/users/synova/raw/partition4/events/full/2023-04-03/events.zip
ADLS Target should be:
adls/consolidated/synova/raw/partition1/customer/2023-04-01/customer.zip
/2023-04-02/customer.zip
/2023-04-03/customer.zip
adls/consolidated/synova/raw/partition2/parts/2023-04-01/parts.zip
/2023-04-02/parts.zip
/2023-04-03/parts.zip
adls/consolidated/synova/raw/partition3/modules/2023-04-01/modules.zip
/2023-04-02/modules.zip
/2023-04-03/modules.zip
adls/consolidated/synova/raw/partition4/events/2023-04-01/events.zip
/2023-04-02/events.zip
/2023-04-03/events.zip
我需要创建一个通用ADF管道来复制这些文件。
谢谢,拉凯什
1条答案
按热度按时间ajsxfq5m1#
您的需求可以通过Get metadata activity(列出文件)和copy activity来完成。但是它需要子文件夹的大量子管道,因为Get meta数据不能给予嵌套子文件夹的路径。
要获取路径列表,您可以尝试使用数据块(笔记本活动)或函数的代码,并在ADF中获取该列表。
如果你只想在ADF中执行,那么通过@Richard Swinbank编写的blog来递归地获取文件列表。
获取列表后,将其交给ForEach活动,并在ForEach内部,将
@item()
复制活动源文件路径和sink文件路径,将@item()
中的'/full'
替换为空字符串。这里,这是我作为样本并将其交给ForEach的文件列表:
我的源数据集:
接收数据集:
在ForEach中,给予
@item()
到源文件的路径,并将替换后的值存储在一个变量中。为接收器文件路径给予此变量。
结果: