如何在Pylucene 8.6.1中创建自定义分析仪?

qxsslcnc  于 2022-11-07  发布在  Lucene
关注(0)|答案(1)|浏览(154)

我看过thisthis和这个,但我不知道为什么它们不适合我。
我通常会使用如下所示的分析仪。

import lucene
from org.apache.lucene.analysis.core import WhitespaceAnalyzer
from org.apache.lucene.index import IndexWriterConfig, IndexWriter
from org.apache.lucene.store import SimpleFSDirectory
from java.nio.file import Paths
from org.apache.lucene.document import Document, Field, TextField

index_path = "./index"

lucene.initVM()

analyzer =  WhitespaceAnalyzer()
config = IndexWriterConfig(analyzer)
store = SimpleFSDirectory(Paths.get(index_path))
writer = IndexWriter(store, config)

doc = Document()
doc.add(Field("title", "The quick brown fox.",  TextField.TYPE_STORED))
writer.addDocument(doc)

writer.close()
store.close()

我想使用MyAnalyzer()而不是WhitespaceAnalyzer(),它应该有LowerCaseFilterWhitespaceTokenizer

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        # What do I write here?

你能帮我写和使用MyAnalyzer()吗?

ojsjcaue

ojsjcaue1#

我找到了herehere,下面的工作。

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer
from org.apache.lucene.analysis import Analyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        source = WhitespaceTokenizer()
        result = LowerCaseFilter(source)
        return Analyzer.TokenStreamComponents(source, result)

如果有人能给我指出正确的方向,让我能够正确地找到这些答案,那就太好了。

相关问题