sql获取组号

9avjhtql  于 2021-07-24  发布在  Java
关注(0)|答案(2)|浏览(399)

我有一个例子数据如下。

+---------+------------+--------+-----------+
    | User Id |   Sequence | Action | Object    |
    |---------|------------|--------|-----------|
    | 12345   |    1       | Eat    | Bread     |
    | 12345   |    2       | Eat    | Steak     |
    | 12345   |    3       | Eat    | Bread     |
    | 12345   |    4       | Drink  | Milk tea  |
    | 12345   |    5       | Drink  | Black tea |  
    | 12345   |    6       | Eat    | Cake      |
    | 12345   |    7       | Eat    | Candy     |
    | 12345   |    8       | Drink  | Black tea | 
    | 12345   |    9       | Drink  | Green tea | 
    | 12345   |    10      | Drink  | Water     |
    +---------+------------+--------+-----------+

现在我想在表中添加一个名为'groupid'的列,结果如下:

+---------+------------+--------+-----------+-----------+
    | User Id |   Sequence | Action | Object    | Group Id. |
    |---------|------------|--------|-----------|-----------|
    | 12345   |    1       | Eat    | Bread     |     1     |
    | 12345   |    2       | Eat    | Steak     |     1     |
    | 12345   |    3       | Eat    | Bread     |     1     |
    | 12345   |    4       | Drink  | Milk tea  |     2     |
    | 12345   |    5       | Drink  | Black tea |     2     |
    | 12345   |    6       | Eat    | Cake      |     3     |
    | 12345   |    7       | Eat    | Candy     |     3     |
    | 12345   |    8       | Drink  | Black tea |     4     |
    | 12345   |    9       | Drink  | Green tea |     4     |
    | 12345   |    10      | Drink  | Water     |     4     |
    +---------+------------+--------+-----------+-----------|

同一个动作应该被分成一组,但会被不同的顺序分开。如何实现sql(我使用googlebigquery)?
万分感谢!

kmb7vmvb

kmb7vmvb1#

下面是bigquery标准sql


# standardSQL

SELECT * EXCEPT(new_group),
  COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
  SELECT *,
    Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
  FROM `project.dataset.table`
)
-- ORDER BY User_Id

如果要应用于问题的样本数据,则输出为

Row User_Id Sequence    Action  Object      Group_Id     
1   12345   1           Eat     Bread       1    
2   12345   2           Eat     Steak       1    
3   12345   3           Eat     Bread       1    
4   12345   4           Drink   Milk tea    2    
5   12345   5           Drink   Black tea   2    
6   12345   6           Eat     Cake        3    
7   12345   7           Eat     Candy       3    
8   12345   8           Drink   Black tea   4    
9   12345   9           Drink   Green tea   4    
10  12345   10          Drink   Water       4
xlpyo6sf

xlpyo6sf2#

这是一种缺口和孤岛问题。一个简单的方法是 lag() 要确定发生变化的位置,请执行以下操作:

select t.*,
       1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
             lag(action) over (order by sequence) as prev_action
      from t
     ) t;

您还可以使用 countif() :

1 + countif( prev_action <> acction ) over (order by sequence) as group_id

相关问题