SQL Server Percentage of null values per column

2hh7jdfx  于 2023-03-22  发布在  其他
关注(0)|答案(4)|浏览(152)

I have hundreds of columns in a table and need to assess what the percentage of missing values (null) per column. Any easy way to do this?

epfja78i

epfja78i1#

Here is an option that uses a little Dynamic SQL

Example

Declare @SQL varchar(max) = 'YourTable'  -- Set Your Table/View Name Here

Select @SQL = 'Select ' +string_agg(concat(quotename(name),' = avg(case when ',QUOTENAME(name),' is null then 1.0 else 0 end) '),',') + ' From '+ @SQL
 From sys.dm_exec_describe_first_result_set('Select top 1 * From '+@SQL,null,null ) A

Exec(@SQL)

Let's assume your table looks like this:

The Generated SQL will look something like this

Select [ID]   = avg(case when [ID] is null then 1.0 else 0 end) 
      ,[Col1] = avg(case when [Col1] is null then 1.0 else 0 end) 
      ,[Col2] = avg(case when [Col2] is null then 1.0 else 0 end) 
      ,[Col3] = avg(case when [Col3] is null then 1.0 else 0 end)  
 From  YourTable

The Result would look like this

If by chance you wanted to UNPIVOT the results

Declare @SQL varchar(max) = 'YourTable'  -- Set Your Table/View Name Here

Select @SQL = '
Select ColName = [Key]
      ,PctNULL = format(try_convert(float,[Value]),''P2'') 
 From  OPENJSON( (Select ' + (Select string_agg(concat(quotename(name),' = avg(case when ',QUOTENAME(name),' is null then 1.0 else 0 end) '),char(13)+',') 
                               From  sys.dm_exec_describe_first_result_set('Select top 1 * From '+@SQL,null,null ) )  + ' 
                   From ' + @SQL + '  
                   For JSON Path,Without_Array_Wrapper 
                  ) 
                )'

Exec(@SQL)

The Results would look like this

rta7y2nd

rta7y2nd2#

"hundreds of columns" is not going to be a good idea. Hopefully this analysis is a step towards restructuring things.

The best way to do this will be to simply generate a query that aggregates all the columns by name.

One more option though is to use XML functionality ( demonstrated with master..spt_values below )

SELECT      col_name,
            FORMAT(AVG(null_flag ), 'P2') AS NullPct
FROM        master..spt_values t
CROSS APPLY (SELECT t.* FOR xml path('row'), elements xsinil, type ) ca(x)
CROSS APPLY (SELECT n.value('local-name(.)', 'sysname'), 
                    IIF(n.value('@xsi:nil', 'varchar(5)') = 'true', 1e, 0)
             FROM  x.nodes('row/*') n(n) ) ca2(col_name, null_flag)
GROUP BY    col_name
juud5qan

juud5qan3#

There is not an easy way to do this. You will need to write code that manually checks each item. This in turn can be not only tedious, but tricky because it some contexts there are character limits on the length of your SQL statements, and hundreds of columns could easily push you past that limit.

Typically, hundreds of similar columns is a sign you've made a serious mistake with the schema design, and should instead have an additional table which includes the key from the original table, a column for the sequential/ordinal value relative to the key, and a column for the value. Then add a new row for a repeated column every time you would have set a value in the old table.

If you are not able to change the broken schema, you might instead look at using sparse columns .

dsf9zpds

dsf9zpds4#

There isn't a great way to do this, however you can ask the engine for some help:

SELECT 'SELECT COUNT(*) AS TotalCount, ' + STRING_AGG(CAST(
'COUNT(' + QUOTENAME(c.name) +')  AS ['+c.name+'cnt], 1-(.0+COUNT(' + QUOTENAME(c.name) +'))/(COUNT(*)+.0)  AS ['+c.name+'pctNull] 
' AS NVARCHAR(MAX)),', ')+'  FROM '+QUOTENAME(s.name)+'.'+QUOTENAME(t.name)
  FROM sys.tables t
    INNER JOIN sys.columns c
      ON t.object_id = c.object_id
    INNER JOIN sys.schemas s
      ON t.schema_id = s.schema_id
 WHERE t.name = 'MyTable'
   AND s.name = 'MySchema'
   GROUP BY s.name, t.name

This does not give you your answer, but it does give you some TSQL which gives you a count of all the columns in the table, and a count of each NON NULL value for each column. Be aware that STRING_AGG is a 2019 function.

相关问题