部分QL/SQL:JSON-SUPER数组查询,用于将值提取到Redshift上的表中

iugsix8n  于 2023-02-17  发布在  其他
关注(0)|答案(1)|浏览(119)

I have a somewhat complicated SUPER array that I brought in to Redshift using a REST API. The 'API_table' currently looks like this: table example
One of the sample columns "values" reads as follows:

values

[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"},...
I've queried the "value" data using:

SELECT c.values[0].value[0].value as v
FROM API_table c;

However, this only returns the first value "6.9" in each row and not all the "value" items in the row. The same approach doesn't work for extracting the "dateTime" items as it produced NULL values:

SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;

The above example only resembles one row of the table. My question is-- are there ways to query the data in every row of the table so that all the values ("value" & "dateTime") of every row can be extracted onto a new table?
The desired result is:
| v | dt |
| ------------ | ------------ |
| 6.9 | 2023-01-30T17:45:00.000-05:00 |
| 6.9 | 2023-01-30T18:00:00.000-05:00 |
| 6.9 | 2023-01-30T18:15:00.000-05:00 |
Many thanks.
I tried the following query but it only returned singular "value' results for each row.

SELECT c.values[0].value[0].value as v
FROM API_table c;

When applied to the "dateTime" items, it yielded NULL values:

SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;

===================================================================
@BillWeiner thanks, I worked through both the CTE and test case examples and got the desired results (especially with CTE). The only issue that remains is knowing how to select the original table/column that contains the entire super array so that it can be inserted into test1 (or col1 in the CTE case).
The original table is structured as follows:
| sourceinfo | variable | values |
| ------------ | ------------ | ------------ |
| {"siteName":"YAN","siteCode":[{"value":"01"}] | {“variableCode":[{"value":"00600","network":"ID"} | [{“value":[{"value":"3.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"4.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"}] |
| {"siteName":"YAN","siteCode":[{"value":"01"}] | {“variableCode":[{"value":"00600","network":"ID"} | [{“value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"}] |
There are super arrays in every row of column 'values' so the issue remains in selecting the column 'values' and extracting each of the multiple value ("6.9") and dateTime objects from each row.

wqsoz72f

wqsoz72f1#

是的。您需要将每个数组元素展开到它自己的行中。将数组展开到行中需要递归CTE(或类似的东西)。这可以基于super中的最大数组长度或使用某个固定的数字集来完成。这组数字需要与您的表交叉连接以提取每个数组元素。
我之前写过一个类似的答案--从Amazon Redshift中的jsons数组中提取基于特定键的值--看看这是否能让你摆脱困境。如果你需要帮助来适应你的情况,请告诉我。

根据评论,看起来需要一个更具体的例子。这个小测试用例应该可以帮助你理解让这个工作所需要的东西。
我已经重复了你的数据几次来创建多个行,并用2个内部数组填充外部数组。这希望能展示如何手动展开多个嵌套数组(紧凑的红移展开方法如下所示,但如果你不先了解概念,很难理解)。
首先设置测试数据:

SET enable_case_sensitive_identifier TO true;

create table test1 (jvalues varchar(2048));

insert into test1 values
('[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]'),
('[{"value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"8.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]');

create table test2 as select json_parse(jvalues) as svalues from test1;

注意,我们必须打开区分大小写,会话才能正确选择"dateTime"。
然后手动展开阵列:

with recursive numbers(n) as
( select 0 as n
    union all
    select n + 1
    from numbers n
    where n.n < 20
), 
exp_top as
(   select row_number() over () as r, n as x, c.svalues[n].value
    from test2 c
    cross join numbers n
)
,
exp_bot as
(   select r, x, n as y, c.value[n]
    from exp_top c
    cross join numbers n
    where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;

本版本
1.创建数字0 - 19,
1.通过与这些数字交叉连接来扩展外部数组(每行2个元素),
1.通过相同的方法扩展内部数组,
1.产生预期结果
Redshift有一个内置的方法来展开超级数组,它在FROM子句中定义。您可以从以下语句中生成相同的结果:

with exp_top as (select inx1, s.value from test2 c, c.svalues s at inx1)
select inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;

更紧凑。这段代码已经过测试,并且在Redshift中运行。如果你看到"dateTime"值为NULL,很可能你没有启用大小写敏感。

要在最终结果中也包含原始超级列:

with exp_top as (select c.svalues, inx1, s.value from test2 c, c.svalues s at inx1)
select svalues, inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;

相关问题