如何在Snowflake SQL中自动将JSON列提取到新列中?

mf98qq94  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(120)

这是从另一个线程的例子,但本质上我想实现这一点:

样本数据

ID  Name  Value
1   TV1   {"URL": "www.url.com", "Icon": "some_icon"}
2   TV2   {"URL": "www.url.com", "Icon": "some_icon", "Facebook": "Facebook_URL"}
3   TV3   {"URL": "www.url.com", "Icon": "some_icon", "Twitter": "Twitter_URL"}
..........

字符串

预期产出

ID  Name  URL          Icon           Facebook      Twitter
1   TV1   www.url.com  some_icon          NULL         NULL
2   TV2   www.url.com  some_icon  Facebook_URL         NULL
3   TV3   www.url.com  some_icon          NULL  Twitter_URL


我对Snowflake完全陌生,所以我对如何轻松地做到这一点感到困惑(希望是自动的,在某些行可能比其他行有更多的json元素的情况下,手动分配会很繁琐)。
我找到了Snowflake的parse_json函数,但它只是在新列中提供了相同的json列,仍然是json格式。
TIA!

inkz8wg9

inkz8wg91#

您可以使用以下SELECT命令在表上创建视图:

SELECT ID, 
       Name,
       Value:URL::varchar as URL,
       Value:Icon::varchar as Icon,
       Value:Facebook::varchar as Facebook,
       Value:Twitter::varchar as Twitter
FROM tablename;

字符串
除非您将其他属性添加到视图中,否则将忽略这些属性。没有办法“自动”将它们包含到视图中,但您可以创建一个存储过程,根据表的完整变量内容中的所有属性动态生成视图。

7lrncoxx

7lrncoxx2#

您可以创建一个SP,根据VARIANT中的JSON数据自动为您构建CREATE VIEW。
下面是一些简单的例子:

-- prepare the table and data
create or replace table test (
  col1 int, col2 string, 
  data1 variant, data2 variant
);

insert into test select 1,2, parse_json(
   '{"URL": "test", "Icon": "test1", "Facebook": "http://www.facebook.com"}'
), parse_json(
   '{"k1": "test", "k2": "test1", "k3": "http://www.facebook.com"}'
);

insert into test select 3,4,parse_json(
   '{"URL": "test", "Icon": "test1", "Twitter": "http://www.twitter.com"}'
), parse_json(
   '{"k4": "v4", "k3": "http://www.ericlin.me"}'
);

-- create the SP, we need to know which table and 
-- column has the variant data
create or replace procedure create_view(
    table_name varchar
)
returns string
language javascript
as
$$
  var final_columns = [];
  
  // first, find out the columns
  var query = `SHOW COLUMNS IN TABLE ${TABLE_NAME}`;
  var stmt = snowflake.createStatement({sqlText: query});
  var result = stmt.execute();
  
  var variant_columns = [];
  
  while (result.next()) {
    var col_name = result.getColumnValue(3);
    var data_type = JSON.parse(result.getColumnValue(4));

    // just use it if it is not a VARIANT type
    // if it is variant type, we need to remember this column
    // and then run query against it later
    if (data_type["type"] != "VARIANT") {
      final_columns.push(col_name);
    } else {
      variant_columns.push(col_name);
    }
  }

  var columns = {};
  query = `SELECT ` + variant_columns.join(', ') + ` FROM ${TABLE_NAME}`;
  stmt = snowflake.createStatement({sqlText: query});
  result = stmt.execute();

  while (result.next()) {
      for(i=1; i<=variant_columns.length; i++) {
        var sub_result = result.getColumnValue(i);
        if(!sub_result) {
          continue;
        }

        var keys = Object.keys(sub_result);

        for(j=0; j<keys.length; j++) {
          columns[variant_columns[i-1] + ":" + keys[j]] = keys[j];
        }
      }
  }

  for(path in columns) {
    final_columns.push(path + "::STRING AS " + columns[path]);
  }

  var create_view_sql = "CREATE OR REPLACE VIEW " + 
    TABLE_NAME + "_VIEW\n" +   
    "AS SELECT " + "\n" +
    "  " + final_columns.join(",\n  ") + "\n" +
    "FROM " + TABLE_NAME + ";";
    
  snowflake.execute({sqlText: create_view_sql});
  return create_view_sql + "\n\nVIEW created successfully.";
$$;

字符串
执行SP将返回以下字符串:

call create_view('TEST');
+---------------------------------------+
| CREATE_VIEW                           |
|---------------------------------------|
| CREATE OR REPLACE VIEW TEST_VIEW      |
| AS SELECT                             |
|   COL1,                               |
|   COL2,                               |
|   DATA1:Facebook::STRING AS Facebook, |
|   DATA1:Icon::STRING AS Icon,         |
|   DATA1:URL::STRING AS URL,           |
|   DATA2:k1::STRING AS k1,             |
|   DATA2:k2::STRING AS k2,             |
|   DATA2:k3::STRING AS k3,             |
|   DATA1:Twitter::STRING AS Twitter,   |
|   DATA2:k4::STRING AS k4              |
| FROM TEST;                            |
|                                       |
| VIEW created successfully.            |
+---------------------------------------+


然后查询视图:

SELECT * FROM TEST_VIEW;
+------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------+
| COL1 | COL2 | FACEBOOK                | ICON  | URL  | K1   | K2    | K3                      | TWITTER                | K4   |
|------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------|
|    1 | 2    | http://www.facebook.com | test1 | test | test | test1 | http://www.facebook.com | NULL                   | NULL |
|    3 | 4    | NULL                    | test1 | test | NULL | NULL  | http://www.ericlin.me   | http://www.twitter.com | v4   |
+------+------+-------------------------+-------+------+------+-------+-------------------------+------------------------+------+


查询源表:

SELECT * FROM TEST;
+------+------+------------------------------------------+-----------------------------------+
| COL1 | COL2 | DATA1                                    | DATA2                             |
|------+------+------------------------------------------+-----------------------------------|
|    1 | 2    | {                                        | {                                 |
|      |      |   "Facebook": "http://www.facebook.com", |   "k1": "test",                   |
|      |      |   "Icon": "test1",                       |   "k2": "test1",                  |
|      |      |   "URL": "test"                          |   "k3": "http://www.facebook.com" |
|      |      | }                                        | }                                 |
|    3 | 4    | {                                        | {                                 |
|      |      |   "Icon": "test1",                       |   "k3": "http://www.ericlin.me",  |
|      |      |   "Twitter": "http://www.twitter.com",   |   "k4": "v4"                      |
|      |      |   "URL": "test"                          | }                                 |
|      |      | }                                        |                                   |
+------+------+------------------------------------------+-----------------------------------+


您可以优化此SP以检测嵌套数据并将其添加到列列表中。

ruarlubt

ruarlubt3#

我得到了一个Invalid argument types for function 'GET'...错误与类似的数据。所以我不得不使用 snowflake PARSE_JSON函数如下:

SELECT ID, 
       Name,
       PARSE_JSON(Value):URL::varchar as URL,
       PARSE_JSON(Value):Icon::varchar as Icon,
       PARSE_JSON(Value):Facebook::varchar as Facebook,
       PARSE_JSON(Value):Twitter::varchar as Twitter
FROM tablename;

字符串

相关问题