SQL Server 查询以从A和B中选择样本

pxq42qpu 于 2023-01-12 发布在其他

关注(0)|答案(1)|浏览(140)

我有两个总体A和B。我需要首先从A中选择10个唯一的随机样本。然后我需要从B中选择10个唯一的随机样本，这些样本也不在从A中选择的样本中。唯一性仅基于ID。虽然有10个唯一ID，但总行数可以更多。
我遵循以下步骤。首先我从A中得到10个不同的样本，我用它们来得到相应的行。

select * from A t1 inner join (select distinct id from A
tablesample(10 rows)) t2 where t1.id = t2.id Stored this as A_records

1.我创建了一个临时视图来存储B可用的池。这将删除第一个样本的任何ID，使其不会在B中重新出现（虽然不需要，但我这样做是为了自己的理智）

create or replace view B_pool as (select distinct id from B where B.Id
not in (select distinct ID from A_records)

1.现在我从B中选择样本

select * from B t1 inner join (select distinct ID from B_pool
tablesample(10 rows)) t2 on t1.id = t2.id

我觉得这个逻辑应该起作用。但是，我似乎仍然在整体样本中得到重复的（来自B的样本包含来自A的样本中的ID）。
我如何才能避免得到这些副本？
总体A和B的一些样本数据以及A和B的预期结果
Sample Data
Desired Results

sql-server

来源：https://stackoverflow.com/questions/75074749/query-to-select-samples-from-a-and-b

1条答案

按热度按时间

vh0rcniy1#

在我看来，这两个查询都很好，它们生成的行的ID不包含在另一个集合中。
访问不同ID的一个简单方法是使用取模函数。例如，对一个数据集使用where mod(id,2) = 0，对另一个数据集使用where mod(id,2) = 1。当然，只要表中有足够的行，您可以除以任何数字，以使其看起来比一个数据集中的偶数ID和另一个数据集中的奇数ID更随机，例如：where mod(id,123) = 45.
完整质询：

select *
from A 
where id in (select distinct id 
             from A
             where mod(id,2) = 0 
             limit 10);

select * 
from B
where id in (select distinct id 
             from B
             where mod(id,2) = 1 
             limit 10);

如果需要随机性，可以在子查询中添加一些ORDER BY子句。

赞(0）回复(0）举报 2023-01-12

我来回答

SQL Server 查询以从A和B中选择样本

1条答案

相关问题

热门标签

最新问答