sql—通过加载最新数据来删除重复值

wljmcqd8  于 2021-07-26  发布在  Java
关注(0)|答案(3)|浏览(316)

我处理的是这样的企业数据。

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

问题是,该公司的数据做法不好,更改/重用了id,但只更新了 load_number 现场。
如何构造sql查询以提取最新加载的数据,如下所示:

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

基本上每个领域都会匹配除了 id 以及 load_number . 因此,假设除这两个字段外,每个字段都匹配,我可以通过取较高的行来删除“重复项”吗 load_number .
我在想某种下降的方法 rank()load_number ,非常感谢您的帮助!

b4lqfgs4

b4lqfgs41#

试试这个

with max_load_numbers_by_id AS (
  SELECT et.id, MAX(et.load_number) AS max_load_number
  FROM enterprise_table et
  GROUP BY et.id 
)

SELECT et.*
FROM enterprise_table et
JOIN max_load_numbers_by_id mlnbi
  ON et.id = mlnbi.id
  AND et.max_load_number = mlnbi.load_number
ygya80vv

ygya80vv2#

你可以用 distinct on :

select distinct on (time, slot_time, region, network) t.*
from mytable t
order by time, slot_time, region, network, load_number desc

db小提琴演示:

load_number |        id | time     | slot_time | region | network
----------: | --------: | :------- | :-------- | :----- | :------
       1692 | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB   
       1692 | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB   
       1692 | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB   
       1692 | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB
krugob8w

krugob8w3#

你可以使用窗口功能 rank 或者 dense_rank 选择最近的装载编号。这是演示。

select
  load_number, 
  id, 
  time, 
  slot_time, 
  region, 
  network
from
(
  select
    *,
    dense_rank() over(order by load_number desc) as rn
  from myTable
) subq
where rn = 1;

输出:

| load_number | id        | time     | slot_time | region | network |
| ----------- | --------- | -------- | --------- | ------ | ------- |
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

相关问题