SQL Server Find the number of distinct values in a column of table using database schema metadata

ulydmbyx  于 2023-08-02  发布在  其他
关注(0)|答案(1)|浏览(113)

I'm looking for a SQL Server query to find the number of distinct values in a column of a table (cardinality of columns) using database schema metadata. I have only database metadata and cannot query the table.

I have this query but it directly query the table. I want to get the same information by querying database schema metadata.

DECLARE @TableName VARCHAR(MAX) = 'tablename';
DECLARE @SQL NVARCHAR(MAX) = '';

SELECT 
    @SQL += 'SELECT ''' + COLUMN_NAME + ''' AS ColumnName, 
                    COUNT(DISTINCT [' + COLUMN_NAME + ']) AS Cardinality 
             FROM [' + TABLE_SCHEMA + '].[' + @TableName + '] 
             UNION ALL '
FROM 
    INFORMATION_SCHEMA.COLUMNS
WHERE 
    TABLE_NAME = @TableName;

SET @SQL = LEFT(@SQL, LEN(@SQL) - 10) 
EXEC sp_executesql @SQL;
jdg4fx2g

jdg4fx2g1#

You can use the statistics histogram, if it exists, for the column. You would need a single-column statistics object with just that column as the key.

SELECT
  ColumnName = c.name,
  StatsName = s.name,
  h.distinct_range_rows
FROM sys.tables t
JOIN sys.columns c ON c.object_id = t.object_id
JOIN sys.stats s
  ON s.object_id = t.object_id
 AND c.column_id = (
    SELECT MIN(sc.column_id)
    FROM sys.stats_columns sc
    WHERE sc.object_id = t.object_id
      AND sc.stats_id = s.stats_id
    HAVING COUNT(*) = 1  -- single-column stats only
  )
CROSS APPLY (
    SELECT
      distinct_range_rows = SUM(h.distinct_range_rows)
    FROM sys.dm_db_stats_histogram(t.object_id, s.stats_id) h
) h
WHERE t.name = @TableName;

You need the right permissions for this to work. The docs say

Requires that the user has select permissions on statistics columns or the user owns the table or the user is a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role.

Note that table statistics are not perfectly reliable, especially if a full scan has not been done recently.

相关问题