GoogleBigQuery—将字符串强制转换为结构数组,然后在另一个sql表中查找值

i2loujxw  于 2021-08-01  发布在  Java
关注(0)|答案(1)|浏览(263)

我当前有一个字符串,它表示表中的结构列表。我想根据结构中元素的值在另一个表中查找值。
例如,下面的car info结构是[spare,cartype,carcolour]。

╔═══════════════════════════╗
║          CarInfo          ║
╠═══════════════════════════╣
║ “[1,1,1]”                 ║
║ “[1,2,1] [1,1,2]”         ║
║ null                      ║
║ “[1,2,1] [1,1,2] [1,1,1]” ║
╚═══════════════════════════╝

我想查一下table:

╔═══════════╦═══════════════╦═════════════╦═════════════════╦══╗
║ CarTypeId ║ CarTypeString ║ CarColourId ║ CarColourString ║  ║
╠═══════════╬═══════════════╬═════════════╬═════════════════╬══╣
║         1 ║ "Hyundai"     ║           1 ║ "Red"           ║  ║
║         1 ║ "Hyundai"     ║           2 ║ "Blue"          ║  ║
║         2 ║ "Toyota"      ║           1 ║ "Green"         ║  ║
║         2 ║ "Toyota"      ║           2 ║ "Yellow"        ║  ║
╚═══════════╩═══════════════╩═════════════╩═════════════════╩══╝

得到如下结果:

╔═════════════════════════════════════════════════════╗
║                       CarInfo                       ║
╠═════════════════════════════════════════════════════╣
║ “[1,Hyundai,Red]”                                   ║
║ “[1,Toyota,Green] [1,Hyundai,Blue]”                 ║
║ null                                                ║
║ “[1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]” ║
╚═════════════════════════════════════════════════════╝

我发现我可以用somestring.split(carinfo,'')将字符串拆分成数组,但此后我不确定如何将其转换为struct或之后的“循环”左连接。

js4nwp54

js4nwp541#

下面是bigquery标准sql


# standardSQL

SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
  SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)

如果要应用于您问题中的样本数据-如下面的示例所示


# standardSQL

WITH `project.dataset.cars` AS (
  SELECT '[1,1,1]' CarInfo UNION ALL
  SELECT '[1,2,1] [1,1,2]' UNION ALL
  SELECT NULL UNION ALL
  SELECT '[1,2,1] [1,1,2] [1,1,1]'
), `project.dataset.lookup` AS (
  SELECT 1 CarTypeId, 'Hyundai' CarTypeString, 1 CarColourId, 'Red' CarColourString UNION ALL
  SELECT 1, 'Hyundai', 2, 'Blue' UNION ALL
  SELECT 2, 'Toyota', 1, 'Green' UNION ALL
  SELECT 2, 'Toyota', 2, 'Yellow'
)
SELECT STRING_AGG('[' || spare || ',' || carTypeString || ',' || carColourString || ']', ' ') AS CarInfo
FROM `project.dataset.cars` t
LEFT JOIN UNNEST(SPLIT(CarInfo, ' ')) info,
UNNEST([STRUCT(
  SPLIT(TRIM(info, '[]'))[OFFSET(0)] AS spare, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(1)] AS INT64) AS carTypeId, 
  CAST(SPLIT(TRIM(info, '[]'))[OFFSET(2)] AS INT64) AS carColourId
)])
LEFT JOIN `project.dataset.lookup` l
USING(carTypeId, carColourId)
GROUP BY FORMAT('%t', t)

输出为

Row CarInfo  
1   [1,Hyundai,Red]  
2   [1,Toyota,Green] [1,Hyundai,Blue]    
3   null     
4   [1,Toyota,Green] [1,Hyundai,Blue] [1,Hyundai,Red]

相关问题