我试图实现一个自动完成功能的静态数据5k记录小集。
我想过使用前缀树来支持这样的查询
class AutocompleteSearch {
class Entry {
String sentence;
int times;
Entry(String sentence, int times) {
this.sentence = sentence;
this.times = times;
}
}
class TrieNode {
TrieNode[] children;
int times;
TrieNode() {
children = new TrieNode[27];
times = 0;
}
}
private TrieNode root;
private TrieNode previous;
private String query;
public AutocompleteSystem(String[] sentences, int[] times) {
root = new TrieNode();
query = "";
for (int i = 0; i < sentences.length; i++) {
insert(sentences[i], times[i]);
}
}
public List<String> input(char c) {
List<String> result = new ArrayList<>();
if (c == '#') {
insert(query, 1);
previous = null;
query = "";
return result;
}
query += c;
List<Entry> history = lookup(c);
history.sort((a, b) -> {
if (a.times == b.times) {
return a.sentence.compareTo(b.sentence);
}
return b.times - a.times;
});
for (int i = 0; i < Math.min(history.size(), 3); i++) {
result.add(history.get(i).sentence);
}
return result;
}
private void insert(String sentence, int times) {
TrieNode current = root;
for (char c : sentence.toCharArray()) {
int index = c == ' ' ? 26 : c - 'a';
if (current.children[index] == null) {
current.children[index] = new TrieNode();
}
current = current.children[index];
}
current.times += times;
}
private List<Entry> lookup(char c) {
List<Entry> history = new ArrayList<>();
if (previous == null && query.length() > 1) {
return history;
}
TrieNode current = previous == null ? root : previous;
int index = c == ' ' ? 26 : c - 'a';
if (current.children[index] == null) {
previous = null;
return history;
}
previous = current.children[index];
traverse(query, previous, history);
return history;
}
private void traverse(String s, TrieNode node, List<Entry> history) {
if (node.times > 0) {
history.add(new Entry(s, node.times));
}
for (int i = 0; i < 27; i++) {
if (node.children[i] != null) {
String next = i == 26 ? s + ' ' : s + (char) ('a' + i);
traverse(next, node.children[i], history);
}
}
}
}
问题是,如果我要扩展这个在全名上自动完成的解决方案,以支持其他字段(例如,假设我有record as)
Student
id:
full_name:
gender:
subject_enrolled:
我不仅要自动完成搜索,但也对主题或其他领域的过滤器。我想过使用一个倒排索引将每个字段Map到它们对应的学生id列表,然后在这些字段之间与自动完成的学生id相交,但它的效率不如自动完成。
有没有人对数据结构和库有什么建议,可以更精确地使用我正在寻找内存中的替代方案,用于mongodb/elasticsearch-like查询支持?
暂无答案!
目前还没有任何答案,快来回答吧!