如何根据可用数据填充行

sqxo8psd 于 2021-08-09 发布在 Java

关注(0)|答案(2)|浏览(393)

使用snowflake sql。
所以我的表有两列：hour和customerid。每位顾客将有两排，一排对应他/她进入商店的时间，一排对应他/她离开商店的时间。有了这些数据，我想创建一个表，该表包含客户在商店的每一个小时。例如，一个客户x在下午1点进入商店，下午5点离开，因此将有5行（每小时1行），如下面的屏幕截图所示。
我的尝试是：

select
    hour
    ,first_value(customer_id) over (partition by customer_id order by hour rows between unbounded preceding and current row) as customer_id
FROM table

sql group-by Date select snowflake-cloud-data-platform

来源：https://stackoverflow.com/questions/61880643/how-to-fill-in-rows-based-on-available-data

2条答案

按热度按时间

xe55xuns1#

在snowflake中，通常使用一个数字表来解决这个问题。你可以用 table (generator ...) 生成此类派生表的语法，然后将其与聚合查询联接，聚合查询使用不等式条件计算每个客户端的小时边界：

select t.customer_id, dateadd(hour, n.rn, t.min_hour) final_hour
from (
    select t.customer_id, min(t.hour) min_hour, max(t.hour) max_hour 
    from mytable t
    group by t.customer_id
) t
inner join (
    select row_number() over(order by null) - 1 rn 
    from table (generator(rowcount => 24))
) n on dateadd(hour, n.rn, t.min_hour) <= t.max_hour
order by customer_id, final_hour

这将处理每个客户最多24小时的访问。如果需要更多，则可以增加表生成器的参数。

赞(0）回复(0）举报 2021-08-09

5lhxktic2#

因此，对于测试数据中显示的示例情况，只有一天的数据，gmb的解决方案可以很好地工作。
一旦你进入许多天（可以/不可以有重叠的商店访问，让我们假装你不能在商店过夜）
可通过以下方式固定：

select t.hour::date, t.customer_id, min(t.hour) min_hour, max(t.hour) max_hour 
from mytable t
group by 1,2

但多个条目需要标签数据，如：

with mytable as (
  select * from values 
    ('2019-04-01 09:00:00','x','in')
    ,('2019-04-01 15:00:00','x','out')
    ,('2019-04-02 12:00:00','x','in')
    ,('2019-04-02 14:00:00','x','out')
   v(hour, customer_id, state)
)

或者可以推断：

with mytable as (
  select * from values ('2019-04-01 09:00:00','x','in'),('2019-04-01 15:00:00','x','out')
     ,('2019-04-02 12:00:00','x','in'),('2019-04-02 14:00:00','x','out')
   v(hour, customer_id, state)
)
select hour::date as day
    ,hour
    ,customer_id
    ,state
    ,BITAND(row_number() over(partition by day, customer_id order by hour), 1) = 1 AS in_dir
from mytable
order by 3,1,2;

给：

DAY           HOUR                   CUSTOMER_ID    STATE    IN_DIR
2019-04-01    2019-04-01 09:00:00    x              in       TRUE
2019-04-01    2019-04-01 15:00:00    x              out      FALSE
2019-04-02    2019-04-02 12:00:00    x              in       TRUE
2019-04-02    2019-04-02 14:00:00    x              out      FALSE

现在可以使用滞后和限定来获得可以处理多个条目的真实范围：

select customer_id
    ,day
    ,hour
    ,lead(hour) over (partition by customer_id, day order by hour) as exit_time
from infer_direction
qualify in_dir = true

它的工作原理是，为每天/客户的所有行获取下一次的时间，然后（通过qualify）只保留“in”行。
然后我们可以加入到一天中的时间：

select dateadd('hour', row_number() over(order by null) - 1, '00:00:00'::time) as hour
from table (generator(rowcount => 24))

因此，这一切编织在一起

with mytable as (
  select hour::timestamp as hour, customer_id, state 
  from values 
     ('2019-04-01 09:00:00','x','in')
     ,('2019-04-01 12:00:00','x','out')
     ,('2019-04-02 13:00:00','x','in')
     ,('2019-04-02 14:00:00','x','out')
     ,('2019-04-02 9:00:00','x','in')
     ,('2019-04-02 10:00:00','x','out')
   v(hour, customer_id, state)
), infer_direction AS (
  select hour::date as day
      ,hour::time as hour
      ,customer_id
      ,state
      ,BITAND(row_number() over(partition by day, customer_id order by hour), 1) = 1 AS in_dir
  from mytable
), visit_ranges as (
  select customer_id
      ,day
      ,hour
      ,lead(hour) over (partition by customer_id, day order by hour) as exit_time
  from infer_direction
  qualify in_dir = true
), time_of_day AS (
    select dateadd('hour', row_number() over(order by null) - 1, '00:00:00'::time) as hour
    from table (generator(rowcount => 24))
)
select t.customer_id
    ,t.day
    ,h.hour
from visit_ranges as t
join time_of_day h on h.hour between t.hour and t.exit_time
order by 1,2,3;

我们得到：

CUSTOMER_ID    DAY           HOUR
x              2019-04-01    09:00:00
x              2019-04-01    10:00:00
x              2019-04-01    11:00:00
x              2019-04-01    12:00:00
x              2019-04-02    09:00:00
x              2019-04-02    10:00:00
x              2019-04-02    13:00:00
x              2019-04-02    14:00:00

赞(0）回复(0）举报 2021-08-09

我来回答

如何根据可用数据填充行

2条答案

相关问题

热门标签

最新问答