java—如何使用jsoup检查网站上pdf文件的上次修改时间

gv8xihay 于 2021-07-09 发布在 Java

关注(0)|答案(1)|浏览(447)

我想检查一个特定页面上pdf文件的上次修改时间。pdf链接是http://www.nfib.com/portals/0/pdf/sbet/sbet201402.pdf
我试着这样做：

Connection.Response rs2 = Jsoup.connect("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf").execute();
    System.out.println("Header = " + rs2.header("Last-Modified"));

我得到这个错误

UnsupportedMimeTypeException

Java Connection Jsoup

来源：https://stackoverflow.com/questions/22323700/how-to-check-last-modified-time-of-a-pdf-file-on-a-website-using-jsoup

1条答案

按热度按时间

rseugnpd1#

如果不一定要用jsoup完成，那么可以使用标准url和urlconnection类，比如

URL url = new URL("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf");
URLConnection connection = url.openConnection();
System.out.println("Header = " + connection.getHeaderField("Last-Modified"));

您需要记住，jsoup是为解析html/xml而设计的，因此默认情况下它需要 text/*, application/xml, or application/xhtml+xml 不是 application/pdf .
如果你看一下处理它的代码

if (contentType != null && !req.ignoreContentType() && (!(contentType.startsWith("text/") || contentType.startsWith("application/xml") || contentType.startsWith("application/xhtml+xml"))))
    throw new UnsupportedMimeTypeException("Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml",
            contentType, req.url().toString());

但是 !req.ignoreContentType() 测试提示我们可以转换需求或纯xml/html类型的输入。为此，您可以添加

ignoreContentType(true)

连接设置，比如

Connection.Response rs2 = Jsoup.connect("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf")
        .ignoreContentType(true)
        .execute();

您应该能够读取返回的标题

System.out.println("Header = " + rs2.header("Last-Modified"));

输出：

Header = Mon, 10 Feb 2014 22:54:15 GMT

赞(0）回复(0）举报 2021-07-09

我来回答

java—如何使用jsoup检查网站上pdf文件的上次修改时间

1条答案

相关问题

热门标签

最新问答