我编写了一个codeigniter类来搜索四个db表中的一些搜索字符串。这些搜索使用mysql自然语言全文搜索。实际执行搜索的sql被用作准备好的语句。这就是sql的样子,命名参数clean\u string被要搜索的带引号的字符串替换,其他命名参数被整数替换,以根据其重要性对相应的表进行加权:
SELECT c_i, c_t, seo_title, SUM(score) AS score
FROM (
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, (MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_title_factor AS score, 'q_ct' AS qid
FROM careers c
WHERE MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_task_statement_factor AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_alternate_title_factor AS score, 'q_at' AS qid
FROM other_titles oat, careers c
WHERE oat.id_code = c.id_code
AND MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_occupation_data_factor AS score, 'q_od' AS qid
FROM other_data od, careers c
WHERE od.id_code = c.id_code
AND MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
) AS union_query
GROUP BY c_i
对于大多数搜索,甚至是随机生成的无意义字符串,这个查询运行得又好又快。但是,当单词和出现在我的搜索字符串中时,它运行得非常慢。为什么会这样?
为了演示这一点,我关闭了mysql查询缓存并运行以下测试代码:
public function test() {
$new_elapsed = 0;
$search_terms = array(
"video games",
"game video",
"software",
"nautical architect",
"CEO",
"pig and blanket machine",
"art gallery",
"rock and roll",
"guitar god magic metal",
"mott the hoople"
);
for($i=0; $i<10; $i++) {
$search = $search_terms[$i];
echo "search term is $search\n";
$start = microtime(true);
$results_new = MY_search::search_new($this->db, $search);
$new_elapsed = (microtime(true) - $start);
echo "new has " . sizeof($results_new) . " matches\n";
echo "new_elapsed: $new_elapsed\n";
echo "--\n";
}
}
从这里的搜索结果中可以看到,两个包含单词和的测试查询的运行速度要慢得多。好像慢了一百倍。
search term is video games
new has 76 matches
new_elapsed: 0.23601198196411
--
search term is game video
new has 73 matches
new_elapsed: 0.27281093597412
--
search term is software
new has 124 matches
new_elapsed: 0.41503596305847
--
search term is nautical architect
new has 24 matches
new_elapsed: 0.11621713638306
--
search term is CEO
new has 2 matches
new_elapsed: 0.012537002563477
--
search term is pig and blanket machine
new has 1078 matches
new_elapsed: 19.989203929901
--
search term is art gallery
new has 53 matches
new_elapsed: 0.19855809211731
--
search term is rock and roll
new has 1077 matches
new_elapsed: 16.845540046692
--
search term is guitar god magic metal
new has 140 matches
new_elapsed: 0.66281390190125
--
search term is mott the hoople
new has 0 matches
new_elapsed: 0.0023038387298584
--
值得注意的是,“the”包含在stop words中,innodb\u ft\u min\u token\u size设置为3。
为什么会这样?我担心“and”可能会被全文搜索引擎理解为某种操作符。这些较长的搜索条件对我的申请是不可接受的。我很想a)在stopwords中添加“and”,或者b)从代码中的任何搜索字符串中过滤单词和,但是我担心这个缓慢的查询问题可能会被其他单词触发。
编辑:根据雷蒙德的评论,我在这里发布了一些解释性的声明。我将重点放在这一个内部查询上,因为它本身非常慢。
EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i
这将产生:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ts fulltext id_code,task task 0 NULL 1 Using where; Using temporary; Using filesort
1 SIMPLE c ALL id_code NULL NULL NULL 1110 Using where
raymond建议查询不符合ansi,因此我修改了groupby子句以添加其他列并重试。还是很慢:
EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i, c_t, seo_title
explain输出是相同的。
如果我解释整个查询(使用union语句等),我会得到一个更详细的结果:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 3332 Using temporary; Using filesort
2 DERIVED c fulltext title title 0 NULL 1 Using where
3 UNION ts fulltext id_code,task task 0 NULL 1 Using where; Using temporary; Using filesort
3 UNION c ALL id_code NULL NULL NULL 1110 Using where
4 UNION oat fulltext id_code,alternate_title alternate_title 0 NULL 1 Using where; Using temporary; Using filesort
4 UNION c ALL id_code NULL NULL NULL 1110 Using where
5 UNION od fulltext PRIMARY,description description 0 NULL 1 Using where; Using temporary; Using filesort
5 UNION c ALL id_code NULL NULL NULL 1110 Using where
NULL UNION RESULT <union2,3,4,5> ALL NULL NULL NULL NULL NULL Using temporary
编辑2:raymond还要求提供表定义。它们在这里:
CREATE TABLE `careers` (
`id` int(6) unsigned NOT NULL AUTO_INCREMENT,
`id_code` varchar(12) NOT NULL DEFAULT '',
`title` varchar(250) NOT NULL DEFAULT '',
`title_singular` varchar(250) NOT NULL DEFAULT '',
`seo_title` varchar(150) NOT NULL,
`cat_id` varchar(4) NOT NULL DEFAULT '',
`occ_code` varchar(10) DEFAULT NULL,
`forum_id` int(6) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_code` (`id_code`),
KEY `cat_id` (`cat_id`),
KEY `forum_id` (`forum_id`),
FULLTEXT KEY `title` (`title`)
) ENGINE=MyISAM AUTO_INCREMENT=1111 DEFAULT CHARSET=latin1
CREATE TABLE `tasks` (
`id_code` char(10) NOT NULL,
`task_id` decimal(8,0) NOT NULL,
`task` varchar(1000) NOT NULL,
`task_type` varchar(12) DEFAULT NULL,
`incumbents_responding` decimal(4,0) DEFAULT NULL,
`date_updated` date NOT NULL,
`domain_source` varchar(30) NOT NULL,
PRIMARY KEY (`task_id`),
KEY `id_code` (`id_code`),
FULLTEXT KEY `task` (`task`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `other_titles` (
`id_code` char(10) NOT NULL,
`alternate_title` varchar(150) NOT NULL,
`short_title` varchar(150) DEFAULT NULL,
`sources` varchar(50) NOT NULL,
KEY `id_code` (`id_code`),
FULLTEXT KEY `alternate_title` (`alternate_title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `other_data` (
`id_code` char(10) NOT NULL,
`title` varchar(150) NOT NULL,
`description` varchar(1000) NOT NULL,
PRIMARY KEY (`id_code`),
FULLTEXT KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
暂无答案!
目前还没有任何答案,快来回答吧!