[英]Represents an element in a specific Source document, which encompasses a #getStartTag(), an optional #getEndTag() and all #getContent() in between.
Take the following HTML segment as an example:
<p>This is a sample paragraph.</p>
The whole segment is represented by an Element
object. This is comprised of the StartTag "<p>
", the EndTag "</p>
", as well as the text in between. An element may also contain other elements between its start and end tags.
The term normal element refers to an element having a #getStartTag()with a StartTag#getStartTagType() of StartTagType#NORMAL. This comprises all HTMLElements and non-HTML elements.
instances are obtained using one of the following methods:
See also the HTMLElements class, and the XML 1.0 specification for elements.
The three possible structures of an element are listed below: Single Tag Element: Example:<img src="mypicture.jpg">
The element consists only of a single #getStartTag() and has no #getContent()(although the start tag itself may have StartTag#getTagContent()).
This occurs in the following situations:
<p>This is a sample paragraph.</p>
The element consists of a #getStartTag(), #getContent(), and an #getEndTag().
(provided the end tag doesn't immediately follow the start tag)
This occurs in the following situations, assuming the start tag's matching end tag is present in the source document:
<p>This text is included in the paragraph element even though no end tag is present.
<p>This is the next paragraph.
The element consists of a #getStartTag() and #getContent(), but no #getEndTag().
This only occurs in an HTML element for which the HTMLElements#getEndTagOptionalElementNames().
The element ends at the start of a tag which implies the termination of the element, called the implicitly terminating tag. If the implicitly terminating tag is situated immediately after the element's #getStartTag(), the element is classed as a single tag element.
See the element parsing rules for HTML elements with optional end tags for details on which tags can implicitly terminate a given element.
See also the documentation of the HTMLElements#getEndTagOptionalElementNames() method.
The following rules describe the algorithm used in the StartTag#getElement() method to construct an element. The detection of the start tag's matching end tag or other terminating tags always takes into account the possible nesting of elements.
If the start tag has a StartTag#getStartTagType() of StartTagType#NORMAL:
If the StartTag#getName() of the start tag matches one of the recognised HTMLElementName (indicating an HTML element):
If the end tag for an element of this StartTag#getName() is HTMLElements#getEndTagForbiddenElementNames(), the parser does not conduct any search for an end tag and a single tag element is created.
* If the end tag for an element of this StartTag#getName() is HTMLElements#getEndTagRequiredElementNames(), the parser searches for the start tag's matching end tag.
If the matching end tag is found, an explicitly terminated element is created.
* If no matching end tag is found, the source document is not valid HTML and the incident is Source#getLogger() as a missing required end tag. In this situation a single tag element is created.
* If the end tag for an element of this StartTag#getName() is HTMLElements#getEndTagOptionalElementNames(), the parser searches not only for the start tag's matching end tag, but also for any other tag that implicitly terminates the element.
For each tag (T2) following the start tag (ST1) of this element (E1):
If T2 is a start tag:
If the StartTag#getName() of T2 is in the list of HTMLElements#getNonterminatingElementNames(String) for E1, then continue evaluating tags from the Element#getEnd() of T2's corresponding StartTag#getElement().
* If the StartTag#getName() of T2 is in the list of HTMLElements#getTerminatingStartTagNames(String) for E1, then E1 ends at the StartTag#getBegin() of T2. If T2 follows immediately after ST1, a single tag element is created, otherwise an implicitly terminated element is created.
* If T2 is an end tag:
If the EndTag#getName() of T2 is the same as that of ST1, an explicitly terminated element is created.
* If the EndTag#getName() of T2 is in the list of HTMLElements#getTerminatingEndTagNames(String) for E1, then E1 ends at the EndTag#getBegin() of T2. If T2 follows immediately after ST1, a single tag element is created, otherwise an implicitly terminated element is created.
* If no more tags are present in the source document, then E1 ends at the end of the file, and an implicitly terminated element is created.
Note that the syntactical indication of an StartTag#isSyntacticalEmptyElementTag() in the start tag is ignored when determining the end of HTML elements. See the documentation of the #isEmptyElementTag() method for more information.
If the start tag is StartTag#isSyntacticalEmptyElementTag(), the parser does not conduct any search for an end tag and a single tag element is created.
* Otherwise, section 3.1 of the XML 1.0 specification states that a matching end tag MUST be present, and the parser searches for the start tag's matching end tag.
If the matching end tag is found, an explicitly terminated element is created.
* If no matching end tag is found, the source document is not valid XML and the incident is Source#getLogger() as a missing required end tag. In this situation a single tag element is created.
If the start tag has any StartTag#getStartTagType() other than StartTagType#NORMAL:
If the start tag's type does not define a StartTagType#getCorrespondingEndTagType(), the parser does not conduct any search for an end tag and a single tag element is created.
If the matching end tag is found, an explicitly terminated element is created.
* If no matching end tag is found, the missing required end tag is Source#getLogger()and a single tag element is created.
以以下HTML段为例:<p>This is a sample paragraph.</p>
术语normal element指的是一个元素,它有一个#getStartTag(),StartTag#getStartTagType()为StartTagType#NORMAL。这包括所有HTMLElement和non-HTML elements。Element
另请参见HtmleElements类和XML 1.0 specification for elements。
元素的三种可能结构如下所示:Single Tag Element:示例:<img src="mypicture.jpg">
*HTMLElements#GetEndTagBanbiddenElementNames()所对应的HTML element。
*不是#isEmptyElementTag()但缺少结束标记的non-HTML element。
Explicitly Terminated Element:示例:<p>This is a sample paragraph.</p>
*不是#isEmptyElementTag()的non-HTML element。
Implicitly Terminated Element:示例:<p>This text is included in the paragraph element even though no end tag is present.
<p>This is the next paragraph.
元素在一个标记的开头结束,该标记意味着元素的终止,称为隐式终止标记。如果隐式终止标记位于元素的#getStartTag()之后,则该元素被分类为single tag element。
*如果start标记的StartTag#getName()与一个可识别的HTMLElementName匹配(表示HTML element):
*如果找到匹配的结束标记,将创建explicitly terminated element。
*如果未找到匹配的结束标记,则源文档不是有效的HTML,并且事件是source#getLogger(),因为缺少必需的结束标记。在这种情况下,将创建一个single tag element。
*如果T2的EndTag#getName()与ST1的EndTag#getName()相同,则会创建一个explicitly terminated element。
*如果源文档中没有更多的标记,那么E1将在文件末尾结束,并创建一个implicitly terminated element。
请注意,在确定HTML elements的结尾时,start标记中StartTag#isSyntacticalEmptyElementTag()的语法指示被忽略。有关更多信息,请参阅#isEmptyElementTag()方法的文档。
*如果start标记的StartTag#getName()与已识别的HTMLElementName之一不匹配(表示non-HTML element):
*如果开始标记是StartTag#isSyntacticalEmptyElementTag(),则解析器不会对结束标记进行任何搜索,并创建一个single tag element。
*如果找到匹配的结束标记,将创建一个explicitly terminated element。
*如果未找到匹配的结束标记,则源文档不是有效的XML,并且事件是source#getLogger(),因为缺少必需的结束标记。在这种情况下,将创建一个single tag element。
*如果开始标记的类型没有定义StartTagType#GetCorrespondingedTagType(),则解析器不会对结束标记进行任何搜索,并创建single tag element。
*如果找到匹配的结束标记,将创建一个explicitly terminated element。
*如果没有找到匹配的结束标记,则缺少的必需结束标记是Source#getLogger(),并创建一个single tag element。
代码示例来源:origin: cflint/CFLint
public void element(final Element element, final Context context, final BugList bugs) {
final String elementName = element.getName();
if (elementName.equals(CF.CFCOMPONENT)) {
// this includes whitespace-change it
final int total = element.getContent().toString().split("\\n").length;
checkSize(LENGTH_THRESHOLD, "EXCESSIVE_COMPONENT_LENGTH", context, 1, 0, total, bugs);
代码示例来源:origin: net.htmlparser.jericho/jericho-html
private static ElementHandler getElementHandler(final Element element) {
if (element.getStartTag().getStartTagType().isServerTag()) return RemoveElementHandler.INSTANCE; // hard-coded configuration does not include server tags in child element hierarchy, so this is normally not executed.
ElementHandler elementHandler=ELEMENT_HANDLERS.get(element.getName());
return (elementHandler!=null) ? elementHandler : StandardInlineElementHandler.INSTANCE;
代码示例来源:origin: net.htmlparser.jericho/jericho-html
public SelectFormControl(final Element element) {
super(element,element.getAttributes().get(Attribute.MULTIPLE)!=null ? FormControlType.SELECT_MULTIPLE : FormControlType.SELECT_SINGLE,false);
final List<Element> optionElements=element.getAllElements(HTMLElementName.OPTION);
optionElementContainers=new ElementContainer[optionElements.size()];
int x=0;
for (Element optionElement : optionElements) {
final ElementContainer optionElementContainer=new ElementContainer(optionElement,true);
if (optionElementContainer.predefinedValue==null)
// use the content of the element if it has no value attribute
public String getPredefinedValue() {
代码示例来源:origin: net.htmlparser.jericho/jericho-html
private void appendElementContent(final Element element) throws IOException {
final int contentEnd=element.getContentEnd();
if (element.isEmpty() || renderedIndex>=contentEnd) return;
final int contentBegin=element.getStartTag().end;
代码示例来源:origin: cflint/CFLint
public int startLine() {
if (element != null && element.getSource() != null) {
return element.getSource().getRow(element.getBegin());
} else {
return 1; // not zero
代码示例来源:origin: cflint/CFLint
public void element(final Element element, final Context context, final BugList bugs) {
if (// element.getName().equals(CF.CFCOMPONENT) ||
element.getName().equals(CF.CFFUNCTION)) {
final String outputAttr = element.getAttributeValue(CF.OUTPUT);
if (outputAttr == null) {
context.addMessage("OUTPUT_ATTR", element.getAttributeValue(CF.NAME));
代码示例来源:origin: cflint/CFLint
private void process(final Element elem, final String space, final Context context)
throws CFLintScanException {
if (skipToPosition > elem.getBegin()) {
} else {
if (elem.getName().equalsIgnoreCase(CF.CFCOMPONENT)) {
final Context componentContext = context.subContext(elem);
doStructureStart(elem, componentContext, CFCompDeclStatement.class);
} else if (elem.getName().equalsIgnoreCase(CF.CFFUNCTION)) {
final Context functionContext = context.subContext(elem);
doStructureStart(elem, functionContext, CFFuncDeclStatement.class);
} else if (elem.getName().equalsIgnoreCase(CF.CFLOOP) && elem.getAttributeValue(CF.QUERY) != null) {
final String qryName = elem.getAttributeValue(CF.QUERY);
doStructureStart(elem, loopContext, CFFuncDeclStatement.class);
if (elem.getName().equalsIgnoreCase(CF.CFSET) || elem.getName().equalsIgnoreCase(CF.CFIF)
|| elem.getName().equalsIgnoreCase(CF.CFELSEIF) || elem.getName().equalsIgnoreCase(CF.CFRETURN)) {
scanElement(elem, context);
final Pattern p = Pattern.compile("<\\w+\\s(.*[^/])/?>", Pattern.MULTILINE | Pattern.DOTALL);
final String expr = elem.getFirstStartTag().toString();
final Matcher m = p.matcher(expr);
代码示例来源:origin: com.github.cfparser/cfml.parsing
public void visit(final Element elem, final int level, CFMLVisitor visitor) throws Exception {
if (skipToPosition > elem.getBegin()) {
if (elem.getName().equalsIgnoreCase("cfset") || elem.getName().equalsIgnoreCase("cfreturn")) {
final String cfscript = elem.toString().substring(elem.getName().length() + 1, elem.toString().length() - 1).trim();
if (cfscript.length() > 0 && visitor.visitPreParseExpression("TAG", cfscript)) {
final CFExpression expression = parseCFExpression(cfscript, visitor);
} else if (elem.getName().equalsIgnoreCase("cfif") || elem.getName().equalsIgnoreCase("cfelseif")) {
final int uglyNotPos = elem.toString().lastIndexOf("<>");
int endPos = elem.getStartTag().getEnd() - 1;
final int nextPos = elem.toString().indexOf(">", uglyNotPos + 2);
if (nextPos > 0 && nextPos < elem.getEndTag().getBegin()) {
endPos = nextPos;
final String cfscript = elem.toString().substring(elem.getName().length() + 1, endPos);
if (cfscript.length() > 0 && visitor.visitPreParseExpression("TAG", cfscript)) {
final CFExpression expression = parseCFExpression(cfscript, visitor);
} else if (elem.getName().equalsIgnoreCase("cfargument")) {
} else if (elem.getName().equalsIgnoreCase("cfscript")) {
if (elem.getEndTag() != null) {
final String cfscript = elem.getContent().toString();
} else {
EndTag nextTag = elem.getSource().getNextEndTag(elem.getBegin());
代码示例来源:origin: com.github.cfparser/cfml.parsing
String attributesFound = "";
Set<?> dictAttributes = cfdic.getElementAttributes(element.getName());
element.getAttributes().populateMap(itemAttributes, true);
int lineNumber = cfmlSource.getRow(element.getBegin());
int startPosition = element.getBegin();
int endPosition = element.getEnd();
String name = element.getName();
String itemData = element.getTextExtractor().toString();
代码示例来源:origin: Netbreeze-GmbH/boilerpipe
Attributes attrs = element.getAttributes();
Map<String, String> attrsUpdate = outputDocument.replace(attrs, true);
if (!element.getName().contains("a")) {
} else {
if (NOT_ALLOWED_HTML_TAGS.contains(element.getName())) {
Segment content = element.getContent();
if (element.getName() == "script"
|| element.getName() == "style"
|| element.getName() == "form") {
if (!element.getStartTag().isSyntacticalEmptyElementTag()) {
代码示例来源:origin: cflint/CFLint
public void element(final Element element, final Context context, final BugList bugs) {
if (element.getName().equals(CF.CFFUNCTION)) {
final int begLine = element.getSource().getRow(element.getBegin());
final String functionType = element.getAttributeValue("returnType");
checkReturnType(functionType, begLine, context, bugs);
代码示例来源:origin: com.github.cfparser/cfml.parsing
public ParserTag(net.htmlparser.jericho.Tag tag) {
if (tag.getElement().getEndTag() != null) {
} else {
代码示例来源:origin: cflint/CFLint
* Determine the line numbers of the <!--- @CFLintIgnore CFQUERYPARAM_REQ ---> tags
* Both the current and the next line are included.
* @param element the element object
* @return the line numbers of any @@CFLintIgnore annotations.
private List<Integer> determineIgnoreLines(final Element element) {
final List<Integer> ignoreLines = new ArrayList<>();
for (Element comment : element.getChildElements()) {
if ("!---".equals(comment.getName()) && comment.toString().contains("@CFLintIgnore") && comment.toString().contains("CFQUERYPARAM_REQ")) {
int ignoreLine = comment.getSource().getRow(comment.getEnd());
ignoreLines.add(ignoreLine + 1);
} else {
return ignoreLines;
代码示例来源:origin: konsoletyper/teavm-flavour
private TemplateNode parseComponent(Element elem) {
int prefixLength = elem.getName().indexOf(':');
String prefix = elem.getName().substring(0, prefixLength);
String name = elem.getName().substring(prefixLength + 1);
String fullName = prefix + ":" + name;
ElementComponentMetadata componentMeta = resolveComponent(prefix, name);
if (componentMeta == null) {
error(elem.getStartTag().getNameSegment(), "Undefined component " + fullName);
return null;
List<PostponedComponentParse> postponedList = new ArrayList<>();
TemplateNode node = parseComponent(componentMeta, prefix, name, elem, postponedList,
new MapSubstitutions(new HashMap<>()));
completeComponentParsing(postponedList, componentMeta, elem);
position = elem.getEnd();
return node;
代码示例来源:origin: cflint/CFLint
public int offset() {
if (element != null) {
if (element.getName().equalsIgnoreCase(CF.CFSCRIPT)) {
return element.getStartTag().getEnd();
} else if (element.getName().equalsIgnoreCase(CF.CFSET)) {
return element.getStartTag().getTagContent().getBegin() + 1;
return element.getBegin();
} else {
return 0;
代码示例来源:origin: pl.edu.icm.synat/synat-portal-core
List<StartTag> tags = sourceHtml.getAllStartTags(FORMULA_TAG_NAME);
for (StartTag tag : tags) {
EndTag endTag = tag.getElement().getEndTag();
if (endTag == null) {
logger.warn("Formula element without end tag in " + source);
for (StartTag texTag : tag.getElement().getContent().getAllStartTags(TEX_TAG_NAME)) {
Element texElement = texTag.getElement();
if (texElement.getEndTag() == null) {
logger.warn("Tex element without end tag in " + source);
outputDocument.replace(texElement.getStartTag(), TEX_SCRIPT_TAG_START);
String content = texElement.getContent().toString().trim();
Pair<Integer, Integer> bounds = getBounds(content);
if(bounds.getRight() == 0){
logger.info("Empty source in Tex tag");
outputDocument.replace(texElement.getContent(), StringUtils.EMPTY);
} else {
String strippedContent = content.substring(bounds.getLeft(), bounds.getRight());
String unescapedContent = StringEscapeUtils.unescapeHtml4(strippedContent);
outputDocument.replace(texElement.getContent(), unescapedContent);
outputDocument.replace(texElement.getEndTag(), TEX_SCRIPT_TAG_END);
代码示例来源:origin: cflint/CFLint
* Parse a CF argument tag to see if any of the arguments names are invalid.
public void element(final Element element, final Context context, final BugList bugs) {
if (element.getName().equals(CF.CFARGUMENT)) {
final int lineNo = context.startLine();
int offset = context.offset();
final String name = element.getAttributeValue(CF.NAME);
if (name != null && name.length() > 0) {
offset = element.getAttributes().get(CF.NAME).getValueSegment().getBegin();
checkNameForBugs(context, name, context.getFilename(), context.getFunctionName(), lineNo, offset, bugs);
} else {
context.addMessage("ARGUMENT_MISSING_NAME", null, this, lineNo, offset);
代码示例来源:origin: cflint/CFLint
public void element(final Element element, final Context context, final BugList bugs) {
final String elementName = element.getName();
if (elementName.equals(CF.CFFUNCTION)) {
// this includes whitespace-change it
final int begLine = element.getSource().getRow(element.getBegin());
final int offset = element.getBegin();
final int total = element.getAllStartTags().size();
checkSize(LENGTH_THRESHOLD, "EXCESSIVE_FUNCTION_LENGTH", context, begLine, offset, total, bugs);
代码示例来源:origin: net.htmlparser.jericho/jericho-html
private static String getOptionLabel(final Element optionElement) {
final String labelAttributeValue=optionElement.getAttributeValue("label");
if (labelAttributeValue!=null) return labelAttributeValue;
return CharacterReference.decodeCollapseWhiteSpace(optionElement.getContent());
private final class OptionElementIterator implements Iterator<Element> {
代码示例来源:origin: cflint/CFLint
* Parse CF function tag declaration to see if the function name is invalid.
public void element(final Element element, final Context context, final BugList bugs) {
if (element.getName().equals(CF.CFFUNCTION)) {
final int lineNo = element.getSource().getRow(element.getBegin());
checkNameForBugs(context, lineNo, element.getBegin());