Python Hash MD5 not equal to SQL Server

9udxz4iz  于 12个月前  发布在  Python
关注(0)|答案(3)|浏览(109)

I'm trying to generate the same MD5 codes from Python 3 and SQL but I get always a different result.

With this Unique Identifier f033b004-eb80-412d-9773-f4f06bb994c1

SELECT  HASHBYTES('MD5', 'f033b004-eb80-412d-9773-f4f06bb994c1')

Result: 0x9BCE8D23CAC76AF4F61C04673CDD0081

And then I do that calculation

SELECT  ABS(HASHBYTES('MD5', 'f033b004-eb80-412d-9773-f4f06bb994c1')  % 10)

Result: 5

Now with python I use the hashlib library

import hashlib

m = hashlib.md5('f033b004-eb80-412d-9773-f4f06bb994c1'.encode() )
m.digest(), m.hexdigest()

Result: b'\x9b\xce\x8d#\xca\xc7j\xf4\xf6\x1c\x04g<\xdd\x00\x81', '9bce8d23cac76af4f61c04673cdd0081'

And then I do the same calculations

int.from_bytes( b'\x9b\xce\x8d#\xca\xc7j\xf4\xf6\x1c\x04g<\xdd\x00\x81',
                    byteorder='big', 
                    signed=False )%10

Result: 9

Does anyone knows how can I get from the Python code the same results as the SQL Server ?

7vhp5slm

7vhp5slm1#

This is because SQL server returns the MD5 hash in UPPERCASE and Python in lowercase. The UPPER and lower case bytes have, of course, different byte values.

wfauudbj

wfauudbj2#

Just ran into this... the problem is that when SQL server converts the hash into integer to perform the modulo, it will not give you the correct integer because the hexadecimal is too big (check out the top answer in here: Conversion of long Hex string to Integer not working as expected (works in SQL) )

On the other hand, python can do the conversion correctly because the max integer size is as much as memory allows

iswrvxsc

iswrvxsc3#

This is true for BigQuery, not sure it it applies to SQL Server too, but the reason for this discrepancy is that the BigQuery functions for MD5, SHA-1, SHA-256, and SHA-512 return base64-encoded representations of the hash values, while the Python hashlib library returns hexadecimal representations.

This code will return the same value as SQL:

hash_obj = hashlib.md5()
    hash_obj.update(input_string.encode())
    hash_bytes = hash_obj.digest()
    base64_encoded = base64.b64encode(hash_bytes).decode()

相关问题