我有这个div,我想问一下,是否可以只使用1个XPATH命令,通过XPATH选择“TEXT_I_NEED_X”?
我能得到的最接近于选择它们的是这样的,但它选择的比我需要的要多://div[@class="article-text-with-img"]/p//text()
<div class="article-text-with-img">
<p>
<a href="#"> Text1 </a>
</p>
<p> </p>
<p>
TEXT_I_NEED_A
<a href="#"> Text2 </a>
</p>
<p>
<span>
TEXT_I_NEED_B
<a href="#"> Text3 </a>
</span>
</p>
<p>
<span>
<span>
TEXT_I_NEED_C
<a href="#"> Text4 </a>
</span>
</span>
</p>
<p>
<span>
TEXT_I_NEED_D
</span>
<a href="#"> Text5 </a>
</p>
<p>
<span>
<spam>
TEXT_I_NEED_D
</span>
<a href="#"> Text5 </a>
</span>
</p>
</div>
2条答案
按热度按时间moiiocjp1#
使用单个XPath表达式:
//div[@class="article-text-with-img"]//a/parent::*/text() | //div[@class="article-text-with-img"]//a/preceding-sibling::span/text()
在命令行上使用
xmllint
(新行和空格包含在text()中)yshpjwxd2#
beautifulsoup
示例:印刷品: