SQL Server How to select rows with Chinese-Japanese characters?

tcomlyy6  于 2023-11-16  发布在  其他
关注(0)|答案(3)|浏览(95)

How can I filter a column by Chinese-Japanese like characters? I am trying to do something like

SELECT * FROM my_table WHERE column LIKE '%[A-Za-z]%'

Is it possible for Chinese or Japanese characters?

m0rkklqb

m0rkklqb1#

When working with unicode string you will always need to prefix your string with N to tell sql server explicitly that there can be unicode characters in the operation. INSERT, UPDATE, SELECT and DELETE its true for all operations.

In your case when selecting data, in where clause you will need to prefix the Search string with N . Something like this....

SELECT * 
FROM my_table 
WHERE column LIKE N'%[A-Z]%' --<-- using Japanese characters here 
OR    Column LIKE N'%[a-z]%' --<-- using Japanese characters here
guz6ccqo

guz6ccqo2#

Below may work as it did for me.

SELECT * FROM my_table WHERE LEN(RTRIM(my_column)) <> DATALENGTH(RTRIM(my_column))

The len function may ignore trailing whitespaces so it's best to trim it before measuring the length. Above came from advise on a Japanese web page.

aemubtdh

aemubtdh3#

I ended up writing a function to parse each character of a string and check if the UNICODE() value is within the range of CJK characters. This doesn't definitively make it only Chinese, but does get Hanzi/Kanji characters. I composited a few ranges that are consecutive.

Unicode Block descriptions: http://zuga.net/articles/unicode-all-blocks/?highlight=104

Some of these integer values exceed the range which SQL can interpret with the UNICODE and NCHAR functions. I welcome constructive feedback to this SQL function.

SQL Function:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:      Tim Bessler
-- Create date: 13 NOV 2023
-- Description: Searches each character in a string to detect if any fall inside the range of 
--  Chinese (or CJK, broadly) unicode blocks. IMPORTANT: string literals must be prefixed N to
--  properly evaluate UTF8 / nvarchar.
--
-- Returns true upon finding any, otherwise returns false
-- =============================================
ALTER FUNCTION ContainsHanzi 
(
    -- Add the parameters for the function here
    @Input nvarchar(MAX)
)
RETURNS bit
AS
BEGIN
    -- Declare the return variable here
    DECLARE @HasHanzi bit = 0
    DECLARE @InputLength int, @ix int = 1 --start at index 1, iterate thru the string char by char
    DECLARE @unicodeValue int
    
    --set loop range to iterate thru the whole string
    SET @InputLength = LEN(@Input)

    --Loop thru the loop until we reach the end of the string OR we detect Hanzi
    WHILE (@ix <= @InputLength AND NOT @HasHanzi = 1)
    BEGIN
        SET @unicodeValue = UNICODE(SUBSTRING(@Input, @ix, 1))
        --consolidated range of Unicode block integer values for Hanzi / CJK characters
        SET @HasHanzi = CASE 
            WHEN (@unicodeValue >= 11904 AND @unicodeValue <= 12031) OR
                 (@unicodeValue >= 12288 AND @unicodeValue <= 12351) OR
                 (@unicodeValue >= 12736 AND @unicodeValue <= 12783) OR
                 (@unicodeValue >= 12800 AND @unicodeValue <= 19903) OR
                 (@unicodeValue >= 19968 AND @unicodeValue <= 40959) OR
                 (@unicodeValue >= 63744 AND @unicodeValue <= 64255) OR
                 (@unicodeValue >= 65072 AND @unicodeValue <= 65103) OR
                 (@unicodeValue >= 131072 AND @unicodeValue <= 195103)
            THEN 1 ELSE 0 END
        SET @ix = @ix + 1
    END

    -- Return the result of the function
    RETURN @HasHanzi

END
GO

Example usage:

WITH src (Pepsi, Lang) AS (
    SELECT N'Pepsi', 'English'
    UNION
    SELECT N'ペプシ', 'Japanese (Katakana)'
    UNION
    SELECT N'百事可乐', 'Chinese')
SELECT Pepsi, Lang, dbo.ContainsHanzi(Pepsi) as ContainsHanzi
FROM src;

Results:

Pepsi (English) = 0

ペプシ (Japanese (Katakana)) = 0

百事可乐 (Chinese) = 1

相关问题