SQL Server Why VARCHAR data sorted as if there was a space appended on the end

5sxhfpxr  于 2023-10-15  发布在  其他
关注(0)|答案(2)|浏览(90)

Using SQL Server 2017, I encountered a query that was sorting on a VARCHAR column returning results in a different order than I expected, namely a string followed by a tab came before just the bare string. Further investigation showed that it was behaving as if the bare string had a space on the end.

Experimenting further, given the following:

SELECT * FROM (VALUES
    (1, 'a' + CHAR(33)),
    (2, 'a   '),
    (3, 'a  '),
    (4, 'a '),
    (5, 'a' + CHAR(31)),
    (6, 'a')
)v(rownum, text)
ORDER BY text

I would have expected the results to be:

rownumtext
6'a'
5'a(31)'
4'a(sp)'
3'a(sp)(sp)'
2'a(sp)(sp)(sp)'
1'a(33)'

But instead it was:

rownumtext
5'a(31)'
3'a(sp)(sp)'
4'a(sp)'
2'a(sp)(sp)(sp)'
6'a'
1'a(33)'

So it seems as if it treats 'a'+(no spaces) and 'a'+(any amount of spaces) as if they were all equivalent. What is going on here?

u2nhd7ah

u2nhd7ah1#

Because SQL Server follows ANSI standard string comparison rules, so 'a' and 'a ' are equal. eg

SELECT *, rank() over (order by text) rank 
FROM (VALUES
    (1, 'a' + CHAR(33)),
    (2, 'a   '),
    (3, 'a  '),
    (4, 'a '),
    (5, 'a' + CHAR(31)),
    (6, 'a')
)v(rownum, text)
ORDER BY text

outputs

rownum      text rank
----------- ---- --------------------
5           a   1
3           a    2
4           a    2
2           a    2
6           a    2
1           a!   6

(6 rows affected)

And the order in which rows are returned that are equal in the ORDER BY clause is undefined, and plan-dependent.

l0oc07j2

l0oc07j22#

So it seems as if it treats 'a'+(no spaces) and 'a'+(any amount of spaces) as if they were all equivalent.

This is exactly what is going on. When dealing with equality operations SQL Server effectively ignores trailing spaces, so '' and ' ' are seen as equal, and so too are 'a' , 'a ' and 'a ' .

As a result the ORDER BY you have, ORDER BY text , isn't deterministic; you have 4 rows that all have the same value. This means that when you run the query the 4 rows with the same value are returned in an arbitrary order. If you wanted to ensure you ordered values with the least trailing spaces first then you could add DATALENGTH(text) to the ORDER BY , as unlike LEN it does include trailing spaces in its calculation:

SELECT rownum,
       text,
       DATALENGTH(text) AS Characters
FROM (VALUES(1, 'a' + CHAR(33)),
            (2, 'a   '),
            (3, 'a  '),
            (4, 'a '),
            (5, 'a' + CHAR(31)),
            (6, 'a'))v(rownum, text)
ORDER BY text,
         DATALENGTH(text);

To get the order you were expecting, you'd want to order the data by it's binary value:

SELECT rownum,
       text,
       DATALENGTH(text) AS Characters,
       CONVERT(varbinary(4),text)
FROM (VALUES(1, 'a' + CHAR(33)),
            (2, 'a   '),
            (3, 'a  '),
            (4, 'a '),
            (5, 'a' + CHAR(31)),
            (6, 'a'))v(rownum, text)
ORDER BY CONVERT(varbinary(4),text);

相关问题