Lucene.net 建立索引时CPU使用率高

myzjeezk 于 2022-11-07 发布在 Lucene

关注(0)|答案(1)|浏览(157)

我写了一个程序，使用Lucene.net来索引一个3GB的文本文件。当索引建立时，进程的CPU消耗高达80，内存使用量高达1GB。**有没有办法限制CPU和内存使用量？**下面是我用来建立索引的程序-

public void BuildIndex(string item)
        {
            System.Diagnostics.EventLog.WriteEntry("LuceneSearch", "Indexing Started for " + item);
            string indexPath = string.Format(BaseIndexPath, "20200414", item);
            if (System.IO.Directory.Exists(indexPath))
            {
                System.IO.Directory.Delete(indexPath, true);
            }

            LuceneIndexDirectory = FSDirectory.Open(indexPath);
            Writer = new IndexWriter(LuceneIndexDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);

            Writer.SetRAMBufferSizeMB(500);

            string file = "c:\LogFile.txt";
            string line=string.Empty;
            int count = 0;
            StreamReader fileReader = new StreamReader(file);
            while ((line = fileReader.ReadLine()) != null)
            {
                count++;
                Document doc = new Document();

                try
                {
                    doc.Add(new Field("LineNumber", count.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                    doc.Add(new Field("LogTime", line.Substring(6, 12), Field.Store.YES, Field.Index.NOT_ANALYZED));
                    doc.Add(new Field("LineText", line.Substring(18, line.Length -18 ), Field.Store.YES, Field.Index.NOT_ANALYZED));
                    Writer.AddDocument(doc);
                }
                catch (Exception)
                {

                    System.Diagnostics.EventLog.WriteEntry("LuceneSearch", "Exception ocurred while entring a line in the index");
                }

            }
            System.Diagnostics.EventLog.WriteEntry("LuceneSearch", "Indexing finished for " + item + ". Starting Optimization now.");
            Writer.Optimize();
            Writer.Commit();

            Writer.Close();

            LuceneIndexDirectory.Dispose();

            System.Diagnostics.EventLog.WriteEntry("LuceneSearch", "Optimization finished for " + item );
        }

lucene

来源：https://stackoverflow.com/questions/61230511/lucene-net-high-cpu-usage-while-building-index

1条答案

按热度按时间

zf2sa74q1#

编写索引通常是在搜索的带外完成的。也就是说，通常在部署或应用程序启动期间完成。当然，也可以进行接近实时的搜索，这涉及保持一个打开的IndexWriter，用于写入和搜索同一索引，但在这种情况下，典型的应用程序一次添加几个文档，它不会一次构建整个索引。
一般来说，如果您在应用程序生命周期的正确时间点构建索引，那么使用这么多RAM并不是一件大事。
但是，您调用的Optimize()不带参数，这将在创建索引后 * 重写 * 整个索引。如果写入的索引占用多个段，则调用Optimize()不带参数将把整个索引重写为单个段。
从文件中（着重号为我）：
请求对索引执行“优化”操作，以使索引能够进行最快的搜索。传统上，这意味着将所有段合并为一个段，就像默认合并策略中所做的那样，但单个合并策略可以用不同的方式实现优化。
建议在完成索引时调用此方法。在频繁更新的环境中，优化最好在低容量时间（如果有）执行。
有关详细讨论，请参阅http://www.gossamer-threads.com/lists/lucene/java-dev/47895。
请注意，优化需要目录中2倍于索引大小的可用空间（如果使用复合文件格式，则需要3倍）。例如，如果索引大小为10 MB，则需要20 MB的可用空间才能完成优化（如果使用复合文件格式，则需要30 MB）。
如果在优化过程中重新打开了部分而非全部读取器，这将导致消耗大于2倍的临时空间，因为这些新读取器将在此时保持部分优化段的打开状态。最好不要在优化运行时重新打开读取器。
请注意，Optimize()方法在Lucene 4.x中被删除了（原因很充分），因此我建议您现在停止使用它。

赞(0）回复(0）举报 2022-11-09

我来回答

Lucene.net 建立索引时CPU使用率高

1条答案

相关问题

热门标签

最新问答