db2 为什么width_bucket不总是返回相同大小的存储桶?

gijlo24d  于 2022-11-23  发布在  DB2
关注(0)|答案(1)|浏览(175)

据我所知,width_bucket函数将值分配给等宽直方图中的bucket。因此,我希望看到bucket大小在分区之间保持一致(默认bucket除外)。但是,我无法理解更改上限时的行为(第二个示例)。

--this works as expected
SELECT WIDTH_BUCKET(col,0,1000,10) AS bucket_no
, min(col) AS min_val
, max(col) AS max_max
, max(col) - min(col) AS width
FROM table
WHERE 1=1
GROUP BY WIDTH_BUCKET(col,0,1000,10)
ORDER BY 1

BUCKET_NO|MIN_VAL|MAX_MAX|WIDTH|
---------+-------+-------+-----+
        1|      1|     99|   98|
        2|    100|    199|   99|
        3|    200|    299|   99|
        4|    300|    399|   99|
        5|    400|    499|   99|
        6|    500|    599|   99|
        7|    600|    699|   99|
        8|    700|    799|   99|
        9|    800|    899|   99|
       10|    900|    999|   99|
       11|   1000|  55786|54786|

在第二个例子中,分区的数量没有被保留,它们的大小也不相等。

--this one doesn't
SELECT WIDTH_BUCKET(col,0,100000,10) AS bucket_no
, min(col) AS min_val
, max(col) AS max_max
, max(col) - min(col) AS width
FROM table
WHERE 1=1
GROUP BY WIDTH_BUCKET(col,0,100000,10)
ORDER BY 1

BUCKET_NO|MIN_VAL|MAX_MAX|WIDTH|
---------+-------+-------+-----+
        1|      1|   9971| 9970|
        2|  10014|  18020| 8006|
        3|  20246|  24007| 3761|
        4|  30070|  30070|    0|
        6|  55786|  55786|    0|
dzhpxtsq

dzhpxtsq1#

根据WIDTH_BUCKET scalar function的定义,实现应该如下所示。
您不能像在您的案例中那样,对非连续的输入数据期望相同大小的桶。

CREATE OR REPLACE FUNCTION WIDTH_BUCKET_MY (EXPRESSION INT, BOUND1 INT, BOUND2 INT, NUM_BUCKETS INT)
RETURNS INT
CONTAINS SQL
DETERMINISTIC 
NO EXTERNAL ACTION
RETURN
  CASE 
    WHEN BOUND1 = BOUND2 THEN RAISE_ERROR ('70001', 'The same as SQLSTATE=2201G')::INT
    ELSE
      CASE
        WHEN EXPRESSION <  BOUND1 AND BOUND1 < BOUND2 OR EXPRESSION >  BOUND1 AND BOUND1 > BOUND2 THEN 0
        WHEN EXPRESSION >= BOUND2 AND BOUND1 < BOUND2 OR EXPRESSION <= BOUND2 AND BOUND1 > BOUND2 THEN NUM_BUCKETS + 1
        ELSE ABS ((EXPRESSION - BOUND1) * NUM_BUCKETS / (BOUND2 - BOUND1))::INT + 1
      END
  END

相关问题