java—apache tika api的bodycontenthandler中writelimit的意义？

yrefmtwq 于 2021-06-29 发布在 Java

关注(0)|答案(1)|浏览(413)

在我们的应用程序中，我们应该检查一个文件（任何格式）是否受密码保护，为此我们使用apachetikaapi。代码块如下所示。

public static boolean isPasswordProtectedFile(File filePart) {
    Parser parser = new AutoDetectParser();
    BodyContentHandler handler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();

    try {
        // parsing the file and testing for Password
        parser.parse(FileUtils.openInputStream(filePart), handler, metadata, context);
        LOGGER.debug("File is without Password ");
    } catch (EncryptedDocumentException e) {
        LOGGER.error("File is encrypted with password", e);
        return true;
    } catch (Exception e) {
        LOGGER.error("File parsing failed", e);
    }
    return false;
}

但是对于我们测试的几个文件来说，这消耗了太多的cpu。但如果我们创建如下bodycontenthandler。然后它完成得更快，占用的cpu更少。 BodyContentHandler handler = new BodyContentHandler(-1); 我查阅了文件，但不能正确理解。期待一个可能的原因。提前谢谢。

Java apache-tika

来源：https://stackoverflow.com/questions/65491141/significance-of-writelimit-in-bodycontenthandler-of-apache-tika-api

1条答案

按热度按时间

nzkunb0c1#

文件上说
https://tika.apache.org/1.4/api/org/apache/tika/sax/bodycontenthandler.html#bodycontenthandler（内景）
创建将xhtml正文字符事件写入内部字符串缓冲区的内容处理程序。可以使用contenthandlerdecorator.tostring（）方法检索缓冲区的内容。内部字符串缓冲区以给定的字符数为界。如果达到此写限制，则抛出saxexception。
writelimit—字符串中包含的最大字符数，或-1以禁用写入限制
缓冲区从未在此处初始化。

赞(0）回复(0）举报 2021-06-29

我来回答

java—apache tika api的bodycontenthandler中writelimit的意义？

1条答案

相关问题

热门标签

最新问答