Which algorithm is used for Checksum(), Binary_Checksum() and CHECKSUM_AVG() function in SQL Server?

jogvjijk  于 2023-06-21  发布在  Go
关注(0)|答案(2)|浏览(125)

Checksum() , Binary_Checksum() and CHECKSUM_AVG() functions are provided by SQL Server to build a hash index based on an expression or a column list.
This can be helpful in determining whether a row has changed or not. The mechanism can then be used to identify whether the record has been updated or not.

I have found lots of example of collisions that are generated same hash value for different values. How we can identify the collision condition for these function.

Anyone have to clue about to which algorithm or technique used to generated/compute hash without hash collisions?

6ljaweal

6ljaweal1#

According to the CHECKSUM weakness explained article on sqlTeam:
The built-in CHECKUM function in SQL Server is built on a series of 4 bit left rotational xor operations.

A forum post from 2006 (linked in the article as well) posted by Peter Larsson inlcudes sql user defined functions that computes checksum. Author of the post claims to 100% compatibility with SQL Server's built in function (I haven't tested it myself).

In case the link goes dead, here is a copy of the relevant part:
With text/varchar/image data, call with SELECT BINARY_CHECKSUM('abcdefghijklmnop'), dbo.fnPesoBinaryChecksum('abcdefghijklmnop') With integer data, call with SELECT BINARY_CHECKSUM(123), dbo.fnPesoBinaryChecksum(CAST(123 AS VARBINARY)) I haven't figured out how to calculate checksum for integers greater than 255 yet.

CREATE FUNCTION dbo.fnPesoBinaryChecksum
(
    @Data IMAGE
)
RETURNS INT
AS

BEGIN
    DECLARE @Index INT,
        @MaxIndex INT,
        @SUM BIGINT,
        @Overflow TINYINT

    SELECT  @Index = 1,
        @MaxIndex = DATALENGTH(@Data),
        @SUM = 0

    WHILE @Index <= @MaxIndex
        SELECT  @SUM = (16 * @SUM) ^ SUBSTRING(@Data, @Index, 1),
            @Overflow = @SUM / 4294967296,
            @SUM = @SUM - @Overflow * 4294967296,
            @SUM = @SUM ^ @Overflow,
            @Index = @Index + 1

    IF @SUM > 2147483647
        SET @SUM = @SUM - 4294967296
    ELSE IF @SUM BETWEEN 32768 AND 65535
        SET @SUM = @SUM - 65536
    ELSE IF @SUM BETWEEN 128 AND 255
        SET @SUM = @SUM - 256

    RETURN @SUM
END

Actually this is an improvement of MS function, since it accepts TEXT and IMAGE data.

CREATE FUNCTION [dbo].[fnPesoTextChecksum]
(
    @Data TEXT
)
RETURNS INT
AS

BEGIN
    DECLARE @Index INT,
        @MaxIndex INT,
        @SUM BIGINT,
        @Overflow TINYINT

    SELECT  @Index = 1,
        @MaxIndex = DATALENGTH(@Data),
        @SUM = 0

    WHILE @Index <= @MaxIndex
        SELECT  @SUM = (16 * @SUM) ^ ASCII(SUBSTRING(@Data, @Index, 1)),
            @Overflow = @SUM / 4294967296,
            @SUM = @SUM - @Overflow * 4294967296,
            @SUM = @SUM ^ @Overflow,
            @Index = @Index + 1

    IF @SUM > 2147483647
        SET @SUM = @SUM - 4294967296
    ELSE IF @SUM BETWEEN 32768 AND 65535
        SET @SUM = @SUM - 65536
    ELSE IF @SUM BETWEEN 128 AND 255
        SET @SUM = @SUM - 256

    RETURN  @SUM
END

Another good read is Exploring Hash Functions In SQL Server by Thomas Kejser, where the author checks the built in hash functions in sql server for speed an quality.

zbsbpyhn

zbsbpyhn2#

PHP implementation of BINARY_CHECKSUM:

$input = 'binary string';
$sum = 0;
for ($i = 0; $i < strlen($input); $i++) {
    $sum = ($sum << 4) ^ ord($in[$i]);
    $sum = ($sum & 0xffffffff) ^ ($sum >> 32);
}
return $sum > 0x7fffffff ? $sum - 0x100000000 : $sum;

相关问题