From what I've read, all Windows versions and .NET are little endian. So why the departure from the Microsoft norm for SQL Server?
What I mean by "SQL Server is big endian" is this:
SELECT CONVERT(VARBINARY, 255);
gives:
0x000000FF
and not
0xFF000000
the way something like .NET's BitConverter.GetBytes()
does. I guess SQL Server could be storing the number internally as little endian and then CONVERT
is just switching it around for some reason. But either way, why?
Edit:
Just noticed this...
DECLARE @q UNIQUEIDENTIFIER = '01234567-89ab-cdef-0123-456789abcdef';
SELECT @q;
SELECT CONVERT(VARBINARY, @q);
gave me:
01234567-89AB-CDEF-0123-456789ABCDEF
0x67452301AB89EFCD0123456789ABCDEF
What the heck?
2条答案
按热度按时间jgovgodb1#
Yes: Windows and .NET are Little Endian.
So why is SQL Server Big Endian? Easy: it's not ;-). The MSDN page for Collation and Unicode Support (within SQL Server) even states:
Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.
So why do you get a Big Endian binary value when converting the Int value of 255? Here is where the confusion is. This question is flawed because it is based on a false premise: that you should see the endianness of hardware and/or software reflected in the converted value. But why would you? Endianness affects the internal representation of a value, how it is stored. But it doesn't change the thing itself. You can convert a
DATETIME
to anINT
and you will see an Integer. But if you save that Integer in an INT field, it will be stored as 4 bytes in reverse order since this is a Little Endian system. But that has nothing to do with what you see when you request that value back from the system and it is displayed to you.For example, run the following to see that converting the
INT
value of301
to aBINARY(2)
results in0x012D
, because 0x012D = 301, just in hexadecimal. And so converting0x012D
back toINT
returns301
, as expected. If the original Int to Binary conversion gave you 0x2D01, well, that does not equate to 301.HOWEVER, if you create a table with an
INT
column, and insert a value of "301" into that column, and useDBCC PAGE
to view the data page as it exists on disk, you will see the following hex digits in the order shown:Also, to address some of the evidence supporting the premise of the question:
Yes, doing
BitConverter.ToString(BitConverter.GetBytes(255))
in .NET will return:FF-00-00-00
BUT, that is not a conversion as
GetBytes()
is not converting the "value" but is instead intending on showing the internal system representation, which changes depending on if the system is Little Endian or Big Endian. If you look at the MSDN page for BitConverter.GetBytes , it might be clearer as to what it is actually doing.When converting actual values, the results won't (and can't) be different across different systems. An integer value of 256 will always be 0x0100 across all systems (even calculators) because Endianness has nothing to do with how you convert values between base 10, base 2, base 16, etc.
In .NET, if you want to do this conversion, you can use
String.Format("{0:X8}", 255)
which will return:000000FF
which is the same as what
SELECT CONVERT(BINARY(4), 255);
returns since they are both converting the value. This result isn't being shown as Big Endian, but is being shown as it truly is, which just happens to match the byte ordering of Big Endian.In other words, when starting with a bit sequence of
100000000
, that can be represented in decimal form as256
, or in hexadecimal form (known asBINARY
/VARBINARY
within SQL Server) as0x0100
. Endianness has nothing to do with this as these are merely different ways of representing the same underlying value.Further evidence of SQL Server being Little Endian can be seen when converting between
VARBINARY
andNVARCHAR
. SinceNVARCHAR
is a 16-bit (i.e. 2 byte) encoding, we can see the byte ordering since there is no numeric equivalent for characters (unlike the 256 -> 0x0100 example) and so there is really nothing else to show (showing Code Point values is not an option due to Supplementary Characters).As you can see below, a Latin capital
A
, which has a Code Point of U+0041 (which is numerically the same as 65) converts to aVARBINARY
value of0x4100
, because that is the UTF-16 Little Endian encoded value of that character:Also, the " Pile of Poo " emoji (Code Point U+01F4A9) can be seen using the Surrogate Pair "D83D + DCA9" (which the
NCHAR
function allows for), or you can inject the UTF-16 Little Endian byte sequence:UNIQUEIDENTIFIER
is similar in that "what it is" and "how it is stored" are two different things and they do not need to match. Keep in mind that UUID / GUID is not a basic datatype likeint
orchar
, but is more of an entity that has a defined format, just like JPG or MP3 files. There is more discussion aboutUNIQUEIDENTIFIER
s in my answer to a related question on DBA.StackExcange (including why it is represented by a combination of Big Endian and Little Endian).3pvhb19x2#
This is not related to endianess. Endinaness is how the bits/bytes are physically stored, this is just showing the 32bit hexadecimal representation of the decimal number 255.
Edit: You can also see it in the docs: https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql?view=sql-server-ver16
"When other data types are converted to binary or varbinary, the data is padded or truncated on the left. Padding is achieved by using hexadecimal zeros."
So not related to endianess, simply padded on the left.
Edit: So IMO this is simply an artifact or how the 'binary' data type in SQL server currently works (the docs say they are free to change how it works between major versions). SQL Server could have introduced a new type 'binary2' that worked completely different, this still has nothing do with SQL server endianness.