我正在尝试编写一个查询,该查询将分析存储在多个表的主键中的数据,以查找其中是否包含unicode字符。
下面是我的表的schema:
mysql> SHOW CREATE TABLE employee_plain;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_plain | CREATE TABLE `employee_plain` (
`emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`age` int(3) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE employee_unicode;
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_unicode | CREATE TABLE `employee_unicode` (
`emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`age` int(3) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
下面是两个表上的数据存储。表employee_unicode
在PRIMARY KEY列中包含unicode值:
mysql> select * from employee_plain;
+------------------------------+----------+------+
| emp_id | emp_name | age |
+------------------------------+----------+------+
| asdasd123 | abcsd | 12 |
| fsoiuioujvsdf4 | abvkd | 13 |
| sdfgjshgjshdfljsfklju4532489 | sdfsdff | 11 |
+------------------------------+----------+------+
3 rows in set (0.00 sec)
mysql> select * from employee_unicode;
+--------------------------------------------------------------+----------+------+
| emp_id | emp_name | age |
+--------------------------------------------------------------+----------+------+
| A ΠΛΦΟΙΚ ΑΕ#1420000000000000000 | sdfsf | 11 |
| sdfsdfsf234 | fsdfsd | 12 |
| ΑΣΕΛ - ΑΦΟΙ. ΣΕΛΙΔΗ Α.Ε.#000000000000000 | sdfsd | 13 |
| ΦΩΤΗΣ#10000000000 | sdfsdfd | 14 |
+--------------------------------------------------------------+----------+------+
4 rows in set (0.00 sec)
我尝试了使用ASCII、BINARY和REGEX的各种查询:
mysql> SELECT
-> TABLE_NAME,
-> COLUMN_NAME,
-> COLUMN_TYPE,
-> IF( COLUMN_NAME REGEXP '[^\x00-\x7F]', 'Contains Unicode', 'No Unicode') AS Unicode_validation
-> FROM
-> information_schema.columns
-> WHERE
-> table_schema = 'amv_testdb' AND
-> COLUMN_KEY = 'PRI'
-> ORDER BY
-> TABLE_NAME,
-> ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain | emp_id | varchar(100) | No Unicode |
| employee_unicode | emp_id | varchar(100) | No Unicode |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)
mysql> SELECT
-> TABLE_NAME,
-> COLUMN_NAME,
-> COLUMN_TYPE,
-> IF( COLUMN_NAME <> CONVERT( COLUMN_NAME USING ASCII), 'No Unicode', 'Contains Unicode') AS Unicode_validation
-> FROM
-> information_schema.columns
-> WHERE
-> table_schema = 'amv_testdb' AND
-> COLUMN_KEY = 'PRI'
-> ORDER BY
-> TABLE_NAME,
-> ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain | emp_id | varchar(100) | Contains Unicode |
| employee_unicode | emp_id | varchar(100) | Contains Unicode |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)
mysql> SELECT
-> TABLE_NAME,
-> COLUMN_NAME,
-> COLUMN_TYPE,
-> IF(CONVERT(COLUMN_NAME USING BINARY) <> COLUMN_NAME, 'Contains Unicode', 'No Unicode') AS Unicode_validation
-> FROM
-> information_schema.columns
-> WHERE
-> table_schema = 'amv_testdb' AND
-> COLUMN_KEY = 'PRI'
-> ORDER BY
-> TABLE_NAME,
-> ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain | emp_id | varchar(100) | No Unicode |
| employee_unicode | emp_id | varchar(100) | No Unicode |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)
请帮助我理解为什么我得到不正确的结果。
2条答案
按热度按时间kmynzznz1#
如果查找包含非ASCII字符的数据,则将存储的值与使用ASCII的该值的转换进行比较,例如
WHERE emp_id <> CONVERT(emp_id USING ASCII)
fiddle
fnvucqvd2#
将识别任何至少包含一个UTF-8字符的
emp_id
(即CHARACTER SET utf8
或utf8mb4
)。