使用多个列表扁平化JSON

k97glaaz  于 2023-05-30  发布在  其他
关注(0)|答案(1)|浏览(212)

我在S3中有一个json文件,看起来像这样。主列表中有列表,每个列表可以包含多个字典:

[
    [
        {"cmsId": "63db9878d1",
            "documentType": "video"}, 
         {"cmsId": "ddd1",
            "documentType": "article"}
],
[   {"cmsId": "63db",
            "documentType": "blog"}, 
         {"cmsId": "63db987",
            "documentType": "article"}
]]

如何使用SNOWFLAKE sql解析cmsIds和documenttypes(仅根级别项)?
我正在尝试这个,但它永远运行,没有输出:

select
              *,
               metadata$filename                                as file_name,
               to_timestamp_ntz(current_timestamp())            as load_date
        from
        @STG.S3/lean/incoming (FILE_FORMAT => STG.JSON_NO_OUTER_IVW     , pattern=>'.*20230526_ids.gz' ) t
      ,TABLE(FLATTEN(input => t.$1 , outer => false   ))

文件格式ddl:

{
   "TYPE":"JSON",
   "FILE_EXTENSION":null,
   "DATE_FORMAT":"AUTO",
   "TIME_FORMAT":"AUTO",
   "TIMESTAMP_FORMAT":"AUTO",
   "BINARY_FORMAT":"HEX",
   "TRIM_SPACE":false,
   "NULL_IF":[
      
   ],
   "COMPRESSION":"AUTO",
   "ENABLE_OCTAL":false,
   "ALLOW_DUPLICATE":false,
   "STRIP_OUTER_ARRAY":true,
   "STRIP_NULL_VALUES":true,
   "IGNORE_UTF8_ERRORS":false,
   "REPLACE_INVALID_CHARACTERS":false,
   "SKIP_BYTE_ORDER_MARK":true
}
mznpcxlj

mznpcxlj1#

如果已经加载了数据,则可以在子查询中逐阶段展开。

select
    l2.VALUE:"cmsId" :: string,
    l2.VALUE:"documentType"  :: string
from (
    select
        l1.VALUE
    from (
        select
            parse_json('[
                [
                    {"cmsId": "63db9878d1",
                        "documentType": "video"}, 
                    {"cmsId": "ddd1",
                        "documentType": "article"}
            ],
            [   {"cmsId": "63db",
                        "documentType": "blog"}, 
                    {"cmsId": "63db987",
                        "documentType": "article"}
            ]
        ]')
    ) as t,
    lateral flatten (t.$1) as l1
) as t,
lateral flatten (t.VALUE) as l2

编辑1如果没有Loaded,可以通过设置STRIP_OUTER_ARRAY = TRUE来tweek FILE_FORMAT对象。这将消除第一个数组,并在两行中加载数据,您可以稍后展平并从第二个数组中读取数据。

相关问题