java中小型静态数据集的mongodb内存自动完成实现

hsgswve4  于 2021-06-14  发布在  ElasticSearch
关注(0)|答案(0)|浏览(175)

我试图实现一个自动完成功能的静态数据5k记录小集。
我想过使用前缀树来支持这样的查询

class AutocompleteSearch {
    class Entry {
        String sentence;
        int times;

        Entry(String sentence, int times) {
            this.sentence = sentence;
            this.times = times;
        }
    }

    class TrieNode {
        TrieNode[] children;
        int times;

        TrieNode() {
            children = new TrieNode[27];
            times = 0;
        }
    }

    private TrieNode root;
    private TrieNode previous;
    private String query;

    public AutocompleteSystem(String[] sentences, int[] times) {
        root = new TrieNode();
        query = "";

        for (int i = 0; i < sentences.length; i++) {
            insert(sentences[i], times[i]);
        }
    }

    public List<String> input(char c) {
        List<String> result = new ArrayList<>();
        if (c == '#') {
            insert(query, 1);
            previous = null;
            query = "";
            return result;
        }

        query += c;
        List<Entry> history = lookup(c);
        history.sort((a, b) -> {
            if (a.times == b.times) {
                return a.sentence.compareTo(b.sentence);
            }
            return b.times - a.times;
        });
        for (int i = 0; i < Math.min(history.size(), 3); i++) {
            result.add(history.get(i).sentence);
        }
        return result;
    }

    private void insert(String sentence, int times) {
        TrieNode current = root;
        for (char c : sentence.toCharArray()) {
            int index = c == ' ' ? 26 : c - 'a';
            if (current.children[index] == null) {
                current.children[index] = new TrieNode();
            }
            current = current.children[index];
        }
        current.times += times;
    }

    private List<Entry> lookup(char c) {
        List<Entry> history = new ArrayList<>();
        if (previous == null && query.length() > 1) {
            return history;
        }

        TrieNode current = previous == null ? root : previous;
        int index = c == ' ' ? 26 : c - 'a';
        if (current.children[index] == null) {
            previous = null;
            return history;
        }

        previous = current.children[index];
        traverse(query, previous, history);
        return history;
    }

    private void traverse(String s, TrieNode node, List<Entry> history) {
        if (node.times > 0) {
            history.add(new Entry(s, node.times));
        }

        for (int i = 0; i < 27; i++) {
            if (node.children[i] != null) {
                String next = i == 26 ? s + ' ' : s + (char) ('a' + i);
                traverse(next, node.children[i], history);
            }
        }
    }
}

问题是,如果我要扩展这个在全名上自动完成的解决方案,以支持其他字段(例如,假设我有record as)

Student
id:
full_name:
gender:
subject_enrolled:

我不仅要自动完成搜索,但也对主题或其他领域的过滤器。我想过使用一个倒排索引将每个字段Map到它们对应的学生id列表,然后在这些字段之间与自动完成的学生id相交,但它的效率不如自动完成。
有没有人对数据结构和库有什么建议,可以更精确地使用我正在寻找内存中的替代方案,用于mongodb/elasticsearch-like查询支持?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题