org.jsoup.parser.Parser.parseInput()方法的使用及代码示例

x33g5p2x  于2022-01-26 转载在 其他  
字(6.1k)|赞(0)|评价(0)|浏览(126)

本文整理了Java中org.jsoup.parser.Parser.parseInput()方法的一些代码示例,展示了Parser.parseInput()的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Parser.parseInput()方法的具体详情如下:
包路径:org.jsoup.parser.Parser
类名称:Parser
方法名:parseInput

Parser.parseInput介绍

暂无

代码示例

代码示例来源:origin: org.jsoup/jsoup

/**
 Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.
 @param html    HTML to parse
 @param baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
 before the HTML declares a {@code <base href>} tag.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML
 */
public static Document parse(String html, String baseUri, Parser parser) {
  return parser.parseInput(html, baseUri);
}

代码示例来源:origin: org.jsoup/jsoup

doc = parser.parseInput(docData, baseUri);
  reader.skip(1);
try {
  doc = parser.parseInput(reader, baseUri);
} catch (UncheckedIOException e) {

代码示例来源:origin: org.kie.workbench/kie-wb-common-cli-forms-migration

private String readTaskFormName(DataInputAssociation inputAssociation) {
  Optional<FormalExpression> optional = inputAssociation.getAssignment()
      .stream()
      .filter(assignment -> assignment.getFrom() != null && assignment.getFrom() instanceof FormalExpression)
      .map(assignment -> (FormalExpression)assignment.getFrom())
      .findAny();
  if(optional.isPresent()) {
    return Parser.xmlParser().parseInput(optional.get().getBody(), "").toString();
  }
  return "";
}

代码示例来源:origin: addthis/hydra

try {
  Parser parser = Parser.htmlParser().setTrackErrors(0);
  @Nonnull Document doc = parser.parseInput(html, "");
  @Nonnull Elements tags = doc.select(tagName);

代码示例来源:origin: DigitalPebble/storm-crawler

/**
 * Attempt to find a META tag in the HTML that hints at the character set
 * used to write the document.
 */
private static String getCharsetFromMeta(byte buffer[], int maxlength) {
  // convert to UTF-8 String -- which hopefully will not mess up the
  // characters we're interested in...
  int len = buffer.length;
  if (maxlength > 0 && maxlength < len) {
    len = maxlength;
  }
  String html = new String(buffer, 0, len, DEFAULT_CHARSET);
  Document doc = Parser.htmlParser().parseInput(html, "dummy");
  // look for <meta http-equiv="Content-Type"
  // content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
  Elements metaElements = doc
      .select("meta[http-equiv=content-type], meta[charset]");
  String foundCharset = null;
  for (Element meta : metaElements) {
    if (meta.hasAttr("http-equiv"))
      foundCharset = getCharsetFromContentType(meta.attr("content"));
    if (foundCharset == null && meta.hasAttr("charset"))
      foundCharset = meta.attr("charset");
    if (foundCharset != null)
      return foundCharset;
  }
  return foundCharset;
}

代码示例来源:origin: samczsun/Skype4J

@Override
  public void handle(SkypeImpl skype, JsonObject resource) throws ConnectionException, ChatNotFoundException, IOException {
    String content = Utils.getString(resource, "content");
    String chatId = Utils.getString(resource, "conversationLink");
    String author = getAuthor(resource);
    Validate.notNull(content, "Null content");
    Validate.notNull(chatId, "Null chat");
    Validate.notNull(author, "Null author");
    String username = getUsername(author);
    Validate.notNull(username, "Null username");
    Chat chat = getChat(chatId, skype);
    Validate.notNull(chat, "Null chatobj");
    Participant initiator = chat.getParticipant(username);
    Validate.notNull(initiator, "Null initiator");
    Document doc = Parser.xmlParser().parseInput(content, "");
    List<ReceivedFile> receivedFiles = doc
        .getElementsByTag("file")
        .stream()
        .map(fe -> new ReceivedFileImpl(fe.text(), Long.parseLong(fe.attr("size")),
            Long.parseLong(fe.attr("tid"))))
        .collect(Collectors.toList());
    FileReceivedEvent event = new FileReceivedEvent(chat, initiator, receivedFiles);
    skype.getEventDispatcher().callEvent(event);
  }
},

代码示例来源:origin: DigitalPebble/storm-crawler

.decode(ByteBuffer.wrap(content)).toString();
jsoupDoc = Parser.htmlParser().parseInput(html, url);

代码示例来源:origin: org.kie.workbench.forms/kie-wb-common-forms-jbpm-integration-backend

if (!StringUtils.isEmpty(taskName)) {
  taskName = Parser.xmlParser().parseInput(taskName,
                       "").toString();
  formVariables.setTaskName(taskName);

代码示例来源:origin: samczsun/Skype4J

Participant u = getUser(from, c);
String content = resource.get("content").asString();
Document doc = Parser.xmlParser().parseInput(content, "");
if (doc.getElementsByTag("meta").size() == 0) {
  throw new IllegalArgumentException("No meta? " + resource);

代码示例来源:origin: DigitalPebble/storm-crawler

@Test
public void testExclusionCase() throws IOException {
  Config conf = new Config();
  conf.put(TextExtractor.EXCLUDE_PARAM_NAME, "style");
  TextExtractor extractor = new TextExtractor(conf);
  String content = "<html>the<STYLE>main</STYLE>content of the page</html>";
  Document jsoupDoc = Parser.htmlParser().parseInput(content,
      "http://stormcrawler.net");
  String text = extractor.text(jsoupDoc.body());
  assertEquals("the content of the page", text);
}

代码示例来源:origin: DigitalPebble/storm-crawler

@Test
public void testMainContent() throws IOException {
  Config conf = new Config();
  conf.put(TextExtractor.INCLUDE_PARAM_NAME, "DIV[id=\"maincontent\"]");
  TextExtractor extractor = new TextExtractor(conf);
  String content = "<html>the<div id='maincontent'>main<div>content</div></div>of the page</html>";
  Document jsoupDoc = Parser.htmlParser().parseInput(content,
      "http://stormcrawler.net");
  String text = extractor.text(jsoupDoc.body());
  assertEquals("main content", text);
}

代码示例来源:origin: DigitalPebble/storm-crawler

@Test
public void testExclusion() throws IOException {
  Config conf = new Config();
  conf.put(TextExtractor.EXCLUDE_PARAM_NAME, "STYLE");
  TextExtractor extractor = new TextExtractor(conf);
  String content = "<html>the<style>main</style>content of the page</html>";
  Document jsoupDoc = Parser.htmlParser().parseInput(content,
      "http://stormcrawler.net");
  String text = extractor.text(jsoupDoc.body());
  assertEquals("the content of the page", text);
}

相关文章