为什么“and”这个词会让mysql全文搜索变得如此缓慢?

doinxwow  于 2021-06-19  发布在  Mysql
关注(0)|答案(0)|浏览(162)

我编写了一个codeigniter类来搜索四个db表中的一些搜索字符串。这些搜索使用mysql自然语言全文搜索。实际执行搜索的sql被用作准备好的语句。这就是sql的样子,命名参数clean\u string被要搜索的带引号的字符串替换,其他命名参数被整数替换,以根据其重要性对相应的表进行加权:

SELECT c_i, c_t, seo_title, SUM(score) AS score
FROM (

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, (MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_title_factor AS score, 'q_ct' AS qid
FROM careers c
WHERE MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_task_statement_factor AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_alternate_title_factor AS score, 'q_at' AS qid
FROM other_titles oat, careers c
WHERE oat.id_code = c.id_code
    AND MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_occupation_data_factor AS score, 'q_od' AS qid
FROM other_data od, careers c
WHERE od.id_code = c.id_code
    AND MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

) AS union_query
GROUP BY c_i

对于大多数搜索,甚至是随机生成的无意义字符串,这个查询运行得又好又快。但是,当单词和出现在我的搜索字符串中时,它运行得非常慢。为什么会这样?
为了演示这一点,我关闭了mysql查询缓存并运行以下测试代码:

public function test() {
    $new_elapsed = 0;

    $search_terms = array(
        "video games",
        "game video",
        "software",
        "nautical architect",
        "CEO",
        "pig and blanket machine",
        "art gallery",
        "rock and roll",
        "guitar god magic metal",
        "mott the hoople"
    );

    for($i=0; $i<10; $i++) {
        $search = $search_terms[$i];
        echo "search term is $search\n";

        $start = microtime(true);

        $results_new = MY_search::search_new($this->db, $search);
        $new_elapsed = (microtime(true) - $start);
        echo "new has " . sizeof($results_new) . " matches\n";
        echo "new_elapsed: $new_elapsed\n";

        echo "--\n";
    }
}

从这里的搜索结果中可以看到,两个包含单词和的测试查询的运行速度要慢得多。好像慢了一百倍。

search term is video games
new has 76 matches
new_elapsed: 0.23601198196411
--
search term is game video
new has 73 matches
new_elapsed: 0.27281093597412
--
search term is software
new has 124 matches
new_elapsed: 0.41503596305847
--
search term is nautical architect
new has 24 matches
new_elapsed: 0.11621713638306
--
search term is CEO
new has 2 matches
new_elapsed: 0.012537002563477
--
search term is pig and blanket machine
new has 1078 matches
new_elapsed: 19.989203929901
--
search term is art gallery
new has 53 matches
new_elapsed: 0.19855809211731
--
search term is rock and roll
new has 1077 matches
new_elapsed: 16.845540046692
--
search term is guitar god magic metal
new has 140 matches
new_elapsed: 0.66281390190125
--
search term is mott the hoople
new has 0 matches
new_elapsed: 0.0023038387298584
--

值得注意的是,“the”包含在stop words中,innodb\u ft\u min\u token\u size设置为3。
为什么会这样?我担心“and”可能会被全文搜索引擎理解为某种操作符。这些较长的搜索条件对我的申请是不可接受的。我很想a)在stopwords中添加“and”,或者b)从代码中的任何搜索字符串中过滤单词和,但是我担心这个缓慢的查询问题可能会被其他单词触发。
编辑:根据雷蒙德的评论,我在这里发布了一些解释性的声明。我将重点放在这一个内部查询上,因为它本身非常慢。

EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i

这将产生:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  ts  fulltext    id_code,task    task    0   NULL    1   Using where; Using temporary; Using filesort
1   SIMPLE  c   ALL     id_code     NULL    NULL    NULL    1110    Using where

raymond建议查询不符合ansi,因此我修改了groupby子句以添加其他列并重试。还是很慢:

EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i, c_t, seo_title

explain输出是相同的。
如果我解释整个查询(使用union语句等),我会得到一个更详细的结果:

id  select_type         table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY         <derived2>  ALL     NULL    NULL    NULL    NULL    3332    Using temporary; Using filesort
2   DERIVED         c   fulltext    title   title   0   NULL    1   Using where
3   UNION       ts  fulltext    id_code,task    task    0   NULL    1   Using where; Using temporary; Using filesort
3   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
4   UNION       oat     fulltext    id_code,alternate_title     alternate_title     0   NULL    1   Using where; Using temporary; Using filesort
4   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
5   UNION       od  fulltext    PRIMARY,description     description     0   NULL    1   Using where; Using temporary; Using filesort
5   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
NULL    UNION RESULT        <union2,3,4,5>  ALL     NULL    NULL    NULL    NULL    NULL    Using temporary

编辑2:raymond还要求提供表定义。它们在这里:

CREATE TABLE `careers` (
 `id` int(6) unsigned NOT NULL AUTO_INCREMENT,
 `id_code` varchar(12) NOT NULL DEFAULT '',
 `title` varchar(250) NOT NULL DEFAULT '',
 `title_singular` varchar(250) NOT NULL DEFAULT '',
 `seo_title` varchar(150) NOT NULL,
 `cat_id` varchar(4) NOT NULL DEFAULT '',
 `occ_code` varchar(10) DEFAULT NULL,
 `forum_id` int(6) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id_code` (`id_code`),
 KEY `cat_id` (`cat_id`),
 KEY `forum_id` (`forum_id`),
 FULLTEXT KEY `title` (`title`)
) ENGINE=MyISAM AUTO_INCREMENT=1111 DEFAULT CHARSET=latin1

CREATE TABLE `tasks` (
 `id_code` char(10) NOT NULL,
 `task_id` decimal(8,0) NOT NULL,
 `task` varchar(1000) NOT NULL,
 `task_type` varchar(12) DEFAULT NULL,
 `incumbents_responding` decimal(4,0) DEFAULT NULL,
 `date_updated` date NOT NULL,
 `domain_source` varchar(30) NOT NULL,
 PRIMARY KEY (`task_id`),
 KEY `id_code` (`id_code`),
 FULLTEXT KEY `task` (`task`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `other_titles` (
 `id_code` char(10) NOT NULL,
 `alternate_title` varchar(150) NOT NULL,
 `short_title` varchar(150) DEFAULT NULL,
 `sources` varchar(50) NOT NULL,
 KEY `id_code` (`id_code`),
 FULLTEXT KEY `alternate_title` (`alternate_title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `other_data` (
 `id_code` char(10) NOT NULL,
 `title` varchar(150) NOT NULL,
 `description` varchar(1000) NOT NULL,
 PRIMARY KEY (`id_code`),
 FULLTEXT KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题