Why is SQL Server Big Endian?

7d7tgy0s  于 2023-11-16  发布在  SQL Server
关注(0)|答案(2)|浏览(155)

From what I've read, all Windows versions and .NET are little endian. So why the departure from the Microsoft norm for SQL Server?

What I mean by "SQL Server is big endian" is this:

SELECT CONVERT(VARBINARY, 255);

gives:

0x000000FF

and not

0xFF000000

the way something like .NET's BitConverter.GetBytes() does. I guess SQL Server could be storing the number internally as little endian and then CONVERT is just switching it around for some reason. But either way, why?

Edit:

Just noticed this...

DECLARE @q UNIQUEIDENTIFIER = '01234567-89ab-cdef-0123-456789abcdef';
SELECT @q;
SELECT CONVERT(VARBINARY, @q);

gave me:

01234567-89AB-CDEF-0123-456789ABCDEF

0x67452301AB89EFCD0123456789ABCDEF

What the heck?

jgovgodb

jgovgodb1#

Yes: Windows and .NET are Little Endian.

So why is SQL Server Big Endian? Easy: it's not ;-). The MSDN page for Collation and Unicode Support (within SQL Server) even states:
Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.

So why do you get a Big Endian binary value when converting the Int value of 255? Here is where the confusion is. This question is flawed because it is based on a false premise: that you should see the endianness of hardware and/or software reflected in the converted value. But why would you? Endianness affects the internal representation of a value, how it is stored. But it doesn't change the thing itself. You can convert a DATETIME to an INT and you will see an Integer. But if you save that Integer in an INT field, it will be stored as 4 bytes in reverse order since this is a Little Endian system. But that has nothing to do with what you see when you request that value back from the system and it is displayed to you.

For example, run the following to see that converting the INT value of 301 to a BINARY(2) results in 0x012D , because 0x012D = 301, just in hexadecimal. And so converting 0x012D back to INT returns 301 , as expected. If the original Int to Binary conversion gave you 0x2D01, well, that does not equate to 301.

SELECT CONVERT(BINARY(2), 301), CONVERT(INT, 0x012D)
-- 0x012D,  301

HOWEVER, if you create a table with an INT column, and insert a value of "301" into that column, and use DBCC PAGE to view the data page as it exists on disk, you will see the following hex digits in the order shown:

2D 01 00 00

Also, to address some of the evidence supporting the premise of the question:

Yes, doing BitConverter.ToString(BitConverter.GetBytes(255)) in .NET will return:
FF-00-00-00

BUT, that is not a conversion as GetBytes() is not converting the "value" but is instead intending on showing the internal system representation, which changes depending on if the system is Little Endian or Big Endian. If you look at the MSDN page for BitConverter.GetBytes , it might be clearer as to what it is actually doing.

When converting actual values, the results won't (and can't) be different across different systems. An integer value of 256 will always be 0x0100 across all systems (even calculators) because Endianness has nothing to do with how you convert values between base 10, base 2, base 16, etc.

In .NET, if you want to do this conversion, you can use String.Format("{0:X8}", 255) which will return:
000000FF

which is the same as what SELECT CONVERT(BINARY(4), 255); returns since they are both converting the value. This result isn't being shown as Big Endian, but is being shown as it truly is, which just happens to match the byte ordering of Big Endian.

In other words, when starting with a bit sequence of 100000000 , that can be represented in decimal form as 256 , or in hexadecimal form (known as BINARY / VARBINARY within SQL Server) as 0x0100 . Endianness has nothing to do with this as these are merely different ways of representing the same underlying value.

Further evidence of SQL Server being Little Endian can be seen when converting between VARBINARY and NVARCHAR . Since NVARCHAR is a 16-bit (i.e. 2 byte) encoding, we can see the byte ordering since there is no numeric equivalent for characters (unlike the 256 -> 0x0100 example) and so there is really nothing else to show (showing Code Point values is not an option due to Supplementary Characters).

As you can see below, a Latin capital A , which has a Code Point of U+0041 (which is numerically the same as 65) converts to a VARBINARY value of 0x4100 , because that is the UTF-16 Little Endian encoded value of that character:

SELECT CONVERT(VARBINARY(10), N'A'), -- 0x4100
       CONVERT(NVARCHAR(5), 0x4100), -- A
       CONVERT(INT, 0x4100),         -- 16640
       UNICODE(N'A'),                -- 65
       CONVERT(VARBINARY(8), 65);    -- 0x00000041

SELECT CONVERT(VARBINARY(10), N'ᄀ'), -- 0x0011
       CONVERT(NVARCHAR(5), 0x0011),  -- ᄀ
       CONVERT(INT, 0x0011),          -- 17
       UNICODE(N'ᄀ'),                -- 4352
       CONVERT(VARBINARY(8), 4352);   -- 0x00001100

Also, the " Pile of Poo " emoji (Code Point U+01F4A9) can be seen using the Surrogate Pair "D83D + DCA9" (which the NCHAR function allows for), or you can inject the UTF-16 Little Endian byte sequence:

SELECT NCHAR(0xD83D) + NCHAR(0xDCA9) AS [SurrogatePair],
       CONVERT(NVARCHAR(5), 0x3DD8A9DC) AS [UTF-16LE];
-- 💩   💩

UNIQUEIDENTIFIER is similar in that "what it is" and "how it is stored" are two different things and they do not need to match. Keep in mind that UUID / GUID is not a basic datatype like int or char , but is more of an entity that has a defined format, just like JPG or MP3 files. There is more discussion about UNIQUEIDENTIFIER s in my answer to a related question on DBA.StackExcange (including why it is represented by a combination of Big Endian and Little Endian).

3pvhb19x

3pvhb19x2#

This is not related to endianess. Endinaness is how the bits/bytes are physically stored, this is just showing the 32bit hexadecimal representation of the decimal number 255.

Edit: You can also see it in the docs: https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql?view=sql-server-ver16

"When other data types are converted to binary or varbinary, the data is padded or truncated on the left. Padding is achieved by using hexadecimal zeros."

So not related to endianess, simply padded on the left.

Edit: So IMO this is simply an artifact or how the 'binary' data type in SQL server currently works (the docs say they are free to change how it works between major versions). SQL Server could have introduced a new type 'binary2' that worked completely different, this still has nothing do with SQL server endianness.

相关问题