在mysql中获取带unicode字符的主键

8tntrjer  于 2023-04-05  发布在  Mysql
关注(0)|答案(2)|浏览(140)

我正在尝试编写一个查询,该查询将分析存储在多个表的主键中的数据,以查找其中是否包含unicode字符。

下面是我的表的schema:

mysql> SHOW CREATE TABLE employee_plain;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table          | Create Table                                                                                                                                                                                                                                                                |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_plain | CREATE TABLE `employee_plain` (
  `emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE employee_unicode;
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table            | Create Table                                                                                                                                                                                                                                                                  |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_unicode | CREATE TABLE `employee_unicode` (
  `emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

下面是两个表上的数据存储。表employee_unicode在PRIMARY KEY列中包含unicode值:

mysql> select * from employee_plain;
+------------------------------+----------+------+
| emp_id                       | emp_name | age  |
+------------------------------+----------+------+
| asdasd123                    | abcsd    |   12 |
| fsoiuioujvsdf4               | abvkd    |   13 |
| sdfgjshgjshdfljsfklju4532489 | sdfsdff  |   11 |
+------------------------------+----------+------+
3 rows in set (0.00 sec)

mysql> select * from employee_unicode;
+--------------------------------------------------------------+----------+------+
| emp_id                                                       | emp_name | age  |
+--------------------------------------------------------------+----------+------+
| A ΠΛΦΟΙΚ ΑΕ#1420000000000000000                              | sdfsf    |   11 |
| sdfsdfsf234                                                  | fsdfsd   |   12 |
| ΑΣΕΛ - ΑΦΟΙ. ΣΕΛΙΔΗ Α.Ε.#000000000000000                     | sdfsd    |   13 |
| ΦΩΤΗΣ#10000000000                                            | sdfsdfd  |   14 |
+--------------------------------------------------------------+----------+------+
4 rows in set (0.00 sec)

我尝试了使用ASCII、BINARY和REGEX的各种查询:

mysql> SELECT
    ->   TABLE_NAME,
    ->   COLUMN_NAME,
    ->   COLUMN_TYPE,
    ->   IF( COLUMN_NAME REGEXP '[^\x00-\x7F]', 'Contains Unicode', 'No Unicode') AS Unicode_validation
    -> FROM
    ->   information_schema.columns
    -> WHERE
    ->   table_schema = 'amv_testdb' AND
    ->   COLUMN_KEY = 'PRI'
    -> ORDER BY
    ->   TABLE_NAME,
    ->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | No Unicode         |
| employee_unicode | emp_id      | varchar(100) | No Unicode         |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)

mysql> SELECT
    ->   TABLE_NAME,
    ->   COLUMN_NAME,
    ->   COLUMN_TYPE,
    ->   IF( COLUMN_NAME <> CONVERT( COLUMN_NAME USING ASCII), 'No Unicode', 'Contains Unicode') AS Unicode_validation
    -> FROM
    ->   information_schema.columns
    -> WHERE
    ->   table_schema = 'amv_testdb' AND
    ->   COLUMN_KEY = 'PRI'
    -> ORDER BY
    ->   TABLE_NAME,
    ->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | Contains Unicode   |
| employee_unicode | emp_id      | varchar(100) | Contains Unicode   |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)

mysql> SELECT
    ->   TABLE_NAME,
    ->   COLUMN_NAME,
    ->   COLUMN_TYPE,
    ->   IF(CONVERT(COLUMN_NAME USING BINARY) <> COLUMN_NAME, 'Contains Unicode', 'No Unicode') AS Unicode_validation
    -> FROM
    ->   information_schema.columns
    -> WHERE
    ->   table_schema = 'amv_testdb' AND
    ->   COLUMN_KEY = 'PRI'
    -> ORDER BY
    ->   TABLE_NAME,
    ->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | No Unicode         |
| employee_unicode | emp_id      | varchar(100) | No Unicode         |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)

请帮助我理解为什么我得到不正确的结果。

kmynzznz

kmynzznz1#

如果查找包含非ASCII字符的数据,则将存储的值与使用ASCII的该值的转换进行比较,例如WHERE emp_id <> CONVERT(emp_id USING ASCII)

CREATE TABLE employees (
    emp_id VARCHAR(255),
    emp_name VARCHAR(255),
    age INT
);

INSERT INTO employees (emp_id, emp_name, age)
VALUES
    ('A ΠΛΦΟΙΚ ΑΕ#1420000000000000000', 'sdfsf', 11),
    ('sdfsdfsf234', 'fsdfsd', 12),
    ('ΑΣΕΛ - ΑΦΟΙ. ΣΕΛΙΔΗ Α.Ε.#000000000000000', 'sdfsd', 13),
    ('ΦΩΤΗΣ#10000000000', 'sdfsdfd', 14);
SELECT *
FROM employees
WHERE emp_id <> CONVERT(emp_id USING ASCII)
员工标识emp_name年龄
A ΠΛΦΟΙΚ ΑΕ #142000000000000000sdfsf十一
A ΣΕΛ - ΑΦΟΙ·ΣΕΛΙΔΗ Α·Ε·#0000000000000自定义文件系统十三
ΦΩΤΗΣ #1000000000sdfsdfd十四岁

fiddle

fnvucqvd

fnvucqvd2#

WHERE HEX(emp_id) REGEXP '^(..)*[CDEFcdef]'

将识别任何至少包含一个UTF-8字符的emp_id(即CHARACTER SET utf8utf8mb4)。

相关问题