I use SQL Server.
I've been handed some large tables with no constrains on them, no keys no nothing.
I know some of the columns have unique values. Is there a smart way for a given table to find the cols that have unique values?
Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.
SELECT COUNT(DISTINCT col) FROM table
Could probably make a cursor to loop over all the columns but want to hear if someone knows a smarter or built-in function.
5条答案
按热度按时间v1l68za41#
Here's an approach that is basically similar to @JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:
It simply compares
COUNT(DISTINCT column)
withCOUNT(*)
for every column. The result will be a table with a single row, where every column will contain the valueUNIQUE
for those columns that do not have duplicates, and empty string if duplicates are present.But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).
Therefore you might need a more thorough analysis of the contents, which you could get with the following script:
This solution takes NULLs into account by checking three values:
COUNT(DISTINCT column)
,COUNT(column)
andCOUNT(*)
. It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:UNIQUE
means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);UNIQUE WITH SINGLE NULL
– as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);UNIQUE with NULLs
– no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);lrpiutwd2#
Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.
Fill in the DB name and tablename at the top. The DB name part is really important since
OBJECT_NAME
only works in the current database context.ncgqoxb03#
If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.
This blog entry steps through the process, it's fairly simple:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
djmepvbi4#
A few words what my code does:
v8wbuo2f5#
What about simple one line of code:
If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.