Intellij Idea Python IntelliJ风格的“search everywhere”算法

jckbn6z7 于 2024-01-05 发布在 Python

关注(0)|答案(2)|浏览(127)

我在python中有一个文件名列表，如下所示：

HelloWorld.csv
hello_windsor.pdf
some_file_i_need.jpg
san_fransisco.png
Another.file.txt
A file name.rar

字符串
我正在寻找一个IntelliJ风格的搜索算法，您可以输入整个单词或简单的文件名中每个单词的第一个字母，或两者的组合。示例搜索：

hw -> HelloWorld.csv, hello_windsor.pdf
hwor -> HelloWorld.csv
winds -> hello_windsor.pdf

sf -> some_file_i_need.jpg, san_francisco.png
sfin -> some_file_i_need.jpg
file need -> some_file_i_need.jpg
sfr -> san_francisco.png

file -> some_file_i_need.jpg, Another.file.txt, A file name.rar
file another -> Another.file.txt
fnrar -> A file name.rar

型
你懂的。
有没有Python包可以做到这一点？理想情况下，它们还可以根据“频率”（文件被访问的频率，最近的时间）以及匹配的强度来对匹配进行排名。
我知道pylucene是一个选项，但它似乎非常重量级的文件名列表很短，我没有兴趣搜索文件的内容？有没有其他选择？

intellij-idea

来源：https://stackoverflow.com/questions/77536080/python-intellij-style-search-everywhere-algorithm

2条答案

按热度按时间

u7up0aaq1#

你可以通过在python中使用正则表达式（import re）并创建函数来实现这一点。这有点复杂，但使用正则表达式可以实现。

import re
def intellij_search(search_term, file_list):
    words = search_term.split()

    #empty list for storing name
    matching_files = []
    for file_name in file_list:
        # Initialize a variable to keep track.
        matches_all_words = True

        #Iterate over each word in the search term
        for word in words:
            # Create a regular expression pattern
            pattern = '.*'.join(word)

            # Check if the file name matches the pattern
            if not re.search(pattern, file_name, re.IGNORECASE):
                # If the file name does not match the pattern, set the 
                #variable to False and break the loop
                matches_all_words = False
                break

        # If the file name matches all words in the search term, add it to 
        #the list of matching file name
        if matches_all_words:
            matching_files.append(file_name)

    # Return the matche file
    return matching_files

files = ['HelloWorld.csv', 'hello_windsor.pdf', 'some_file_i_need.jpg', 'san_francisco.png', 'Another.file.txt', 'A file name.rar']
#print(intellij_search('hw', files)) 
#print(intellij_search('sf', files))
#print(intellij_search('Afn', files))

字符串
我不知道你是否在寻找这样的东西或其他。

赞(0）回复(0）举报 2024-01-05

b1payxdu2#

要实现你想要的IntelliJ风格的模糊过滤器，我们应该首先用精确的语言定义它的预期行为：
1.对于每个文件名，将名称拆分为一个单词列表，拆分方式可以是非单词字符（因此'hello_windsor.pdf'变为['hello', 'windsor', 'pdf']），也可以是后跟一个大写字母并在前面加上另一个单词字符的位置（因此'HelloWorld'变为['Hello', 'World']）。单词仅由字母和/或数字组成。单词列表应小写，以允许不区分大小写的匹配。
1.将给定的查询字符串按空格分割成一个模式列表。所有模式都必须匹配一个名称才能被认为是匹配。

对于每个模式，从第一个单词的第一个字符开始，按照以下方式将模式的每个字符与单词列表逐一匹配：
如果模式的当前字符与当前单词的当前字符匹配：
尝试将模式的下一个字符与当前单词的下一个字符进行匹配（因此，当模式winds中的字符w匹配windstor的第一个字符时，尝试将模式中的i与windstor中的i进行匹配）;
如果失败，请尝试将模式的下一个字符与下一个单词的第一个字符进行匹配（因此当模式sfr中的字符s匹配san francisco中的san的第一个字符但下一个字符f不匹配san中的a时，尝试将f与下一个单词francisco中的f匹配）;
如果不是，则从下一个单词开始匹配模式的开始（因此，如果file不匹配another file中的第一个单词another，则尝试将file与下一个单词file匹配）。
如果到达模式的末尾，则找到匹配。
如果列表中的所有单词在到达模式的末尾之前都用完了，则找不到匹配项。

由于一个模式有多个可能的路径来匹配一个单词列表，我们可以用回溯算法来实现探索的行为：

import re

def fuzzy_filter(patterns, names):
    def match(pattern, words_index=0, pattern_index=0, word_index=0):
        return pattern_index == len(pattern) or (
            words_index < len(words) and word_index < len(words[words_index])
        ) and (
            pattern[pattern_index] == words[words_index][word_index] and (
                match(pattern, words_index, pattern_index + 1, word_index + 1) or
                match(pattern, words_index + 1, pattern_index + 1)
            ) or match(pattern, words_index + 1)
        )

    for name in names:
        words = list(map(str.lower, re.split(r'[\W_]+|(?<=\w)(?=[A-Z])', name)))
        if all(map(match, patterns.split())):
            yield name

字符串
以便：

file_names = [
    'HelloWorld.csv',
    'hello_windsor.pdf',
    'some_file_i_need.jpg',
    'san_francisco.png',
    'Another.file.txt',
    'A file name.rar'
]

queries = '''\
hw
hwor
winds
sf
file need
sfr
file
file another
fnrar'''.splitlines()

for query in queries:
    print(f'{query} -> {", ".join(fuzzy_filter(query, file_names))}')

型
产出：

hw -> HelloWorld.csv, hello_windsor.pdf
hwor -> HelloWorld.csv
winds -> hello_windsor.pdf
sf -> some_file_i_need.jpg, san_francisco.png
file need -> some_file_i_need.jpg
sfr -> san_francisco.png
file -> some_file_i_need.jpg, Another.file.txt, A file name.rar
file another -> Another.file.txt
fnrar -> A file name.rar

型
演示：https://ideone.com/D1ITXQ
最后，要按上次访问时间对文件名进行排名，您可以按os.path.getatime以相反的顺序对列表进行排序：

import os

os.chdir(path_to_files)
matches = sorted(fuzzy_filter(query, file_names), key=os.path.getatime, reverse=True)

型

赞(0）回复(0）举报 2024-01-05

我来回答

Intellij Idea Python IntelliJ风格的“search everywhere”算法

2条答案

相关问题

热门标签

最新问答