如何在Oracle中更快地选择具有数百万行的表中的随机行?

pbwdgjma  于 2023-03-29  发布在  Oracle
关注(0)|答案(7)|浏览(173)

有没有一种方法可以在Oracle中更快地选择具有数百万行的表中的随机行?我尝试使用sample(x)和dbms_random.value,运行时间很长。

ltqd579y

ltqd579y1#

使用适当的sample(x)值是最快的方法。它是块随机的,在块内是行随机的,所以如果你只想要一个随机行:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum = 1

我使用的是一个子分区表,即使抓取多行,也能获得很好的随机性:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum <= 5

    FILENO    BLOCKNO     OFFSET
---------- ---------- ----------
       152    2454936         11
       152    2463140         32
       152    2335208          2
       152    2429207         23
       152    2746125         28

我怀疑您可能应该调优SAMPLE子句,以便为所获取的内容使用适当的样本大小。

uwopmtnx

uwopmtnx2#

首先从Adam的答案开始,但如果SAMPLE不够快,即使使用ROWNUM优化,您也可以使用块样本:

....FROM [table] SAMPLE BLOCK (0.01)

这将在块级别而不是每一行应用采样。这确实意味着它可以从表中跳过大量数据,因此采样百分比将非常粗略。对于具有低百分比的SAMPLE BLOCK返回零行并不罕见。

mwkjh3gx

mwkjh3gx3#

以下是AskTom上的问题:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
如果你知道你的表有多大,就使用上面描述的示例块。如果你不知道,你可以修改下面的例程来得到你想要的行数。
复制自:http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
  for exp_v in -6..2 loop
    exit when (urowid_t.count > 0);
    if (exp_v < 2) then
      sql_v := 'select rowid from ' || table_name
      || ' sample block (' || power(10, exp_v) || ')';
    else
      sql_v := 'select rowid from ' || table_name;
    end if;
    cursor_v := dbms_sql.open_cursor;
    dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
    dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
    status_v := dbms_sql.execute(cursor_v);
    loop
      rows_v := dbms_sql.fetch_rows(cursor_v);
      dbms_sql.column_value(cursor_v, 1, urowid_t);
      exit when rows_v != 100;
    end loop;
    dbms_sql.close_cursor(cursor_v);
  end loop;
  if (urowid_t.count > 0) then
    return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
  end if;
  return null;
exception when others then
  if (dbms_sql.is_open(cursor_v)) then
    dbms_sql.close_cursor(cursor_v);
  end if;
  raise;
end;
/
show errors
wgeznvg7

wgeznvg74#

下面这个问题的解决方案并不是确切的答案,但在许多情况下,您尝试选择一行并尝试将其用于某些目的,然后将其状态更新为“已使用”或“已完成”,以便不再选择它。
解决方案:
下面的查询是有用的,但如果你的表很大,我只是试着看到你肯定会面临这个查询的性能问题。
SELECT * FROM(SELECT * FROM table ORDER BY dbms_random.value)WHERE rownum = 1
因此,如果您像下面这样设置rownum,那么您可以解决性能问题。通过递增rownum,您可以减少可能性。但在这种情况下,您将始终从相同的1000行中获取行。如果您从1000中获取一行并将其状态更新为“USED”,则每次使用“ACTIVE”查询时,您几乎都会得到不同的行

SELECT * FROM
( SELECT * FROM table
where rownum < 1000
  and status = 'ACTIVE'
  ORDER BY dbms_random.value  )
WHERE rownum = 1

选择行后更新行的状态,如果不能更新,则意味着另一个事务已经使用了它。然后您应该尝试获取新行并更新其状态。顺便说一下,由于rownum为1000,因此两个不同事务获取同一行的可能性为0.001。

dw1jzc5e

dw1jzc5e5#

有人告诉sample(x)是最快的方法。但对我来说,这个方法比sample(x)方法稍微快一点。无论表的大小如何,它都需要几分之一秒(在我的情况下是0.2)。如果需要更长的时间,请尝试使用提示(--+ leading(e)use_nl(e t)rowid(t))可以帮助

SELECT *
  FROM My_User.My_Table
 WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                  FROM (SELECT o.Data_Object_Id,
                               e.Relative_Fno,
                               e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                          FROM Dba_Extents e
                          JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                         WHERE e.Segment_Name = 'MY_TABLE'
                           AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                              (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                      MAX(e.Owner)        AS Owner,
                                      MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                 FROM Dba_Extents e
                                WHERE e.Segment_Name = 'MY_TABLE'
                                  AND e.Owner = 'MY_USER'
                                  AND e.Segment_Type = 'TABLE')) e
                  JOIN My_User.My_Table t
                    ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                   AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
zhte4eai

zhte4eai6#

未返回行时重试的版本:

WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                     MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                FROM (SELECT o.Data_Object_Id,
                             e.Relative_Fno,
                             e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                        FROM Dba_Extents e
                        JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                       WHERE e.Segment_Name = 'MY_TABLE'
                         AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                            (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                    MAX(e.Owner)        AS Owner,
                                    MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                               FROM Dba_Extents e
                              WHERE e.Segment_Name = 'MY_TABLE'
                                AND e.Owner = 'MY_USER'
                                AND e.Segment_Type = 'TABLE')) e
                JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                  AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
  Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                             FROM Dual
                             LEFT JOIN gen ON 1=1
                            UNION ALL
                           SELECT Cnt + 1, gen.Row_Id
                             FROM Retries
                             LEFT JOIN gen ON 1=1
                            WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
  FROM MY_USER.MY_TABLE
 WHERE ROWID = (SELECT Row_Id
                  FROM Retries
                 WHERE Row_Id IS NOT NULL)
apeeds0o

apeeds0o7#

可以使用伪随机行吗?

select * from (
  select * from ... where... order by ora_hash(rowid)
) where rownum<100

相关问题