python-3.x 打印标签beautifulsoup后剩余标签的内容

xu3bshqb 于 2022-12-05 发布在 Python

关注(0)|答案(1)|浏览(176)

我刚刚用.find_all('li')打印了li的所有内容，我想在li标记结束后继续打印**'p'标记，比如不在html开头或中间的'p'**标记。'p'标记或结尾的剩余标记。请帮助。基本上需要最终列表结束标记后的所有内容。

from bs4 import BeautifulSoup

html_doc = """\
<html>
<p>
don't need this
</p>
<li>
text i need
</li>
<li>
<p>
don't need this
</p>
<p>
don't need this
</p>
<li>
text i need 
    <ol>
    <li>
    text i need but appended to parent li tag
    </li>
    <li>
    text i need but appended to parent li tag
    </li>
    </ol>
</li>
<li>
text i need
</li>
<p>
also need this
</p>
<p>
and this
</p>
<p>
and this too
</p>"""

soup = BeautifulSoup(html_doc, "html.parser")

for li in soup.select("li"):
    if li.find_parent("li"):
        continue
    print(" ".join(li.text.split()))
    print("--sep--")

这张照片

text i need
--sep--
text i need text i need but appended to parent li tag text i need but appended to parent li tag
--sep--
text i need
--sep--

感谢@Andrej Kesely
"我需要这个"

text i need
--sep--
text i need text i need but appended to parent li tag text i need but appended to parent li tag
--sep--
text i need
--sep--
also need this
--sep--
and this
--sep--
and this too
--sep--

python-3.x

来源：https://stackoverflow.com/questions/74677516/print-contents-of-remaining-tags-after-a-tag-beautifulsoup

1条答案

按热度按时间

2hh7jdfx1#

您可以尝试以下操作：

for li in soup.select("li:not(li li)"): 
    print(" ".join([
        d.get_text().strip() for d in li.descendants 
        if 'NavigableString' in str(type(d)) and 
        d.parent.name == 'li' and d.get_text().strip()
    ])) 
    print("--sep--")

# for the p tags after ANY of the [outermost] li tags
for p in soup.select("li:not(li li) ~ p"):  print(p.text.strip(), "\n--sep--")

（使用:not(li li)可以使您不需要if li.find_parent("li"): continue部分。）
这应该能让你

来自[最外面的] li标记的文本，但仅由直接位于li标记内部或li标记内部的字符串组成

然后再

来自p标记的文本，这些标记是前一个最外层li标记的同级标记。（如果只需要在 lastli之后的p标记，请使用for p in soup.select("li:not(li li) ~ p:not(:has(~ li))")...）

赞(0）回复(0）举报 2022-12-05

我来回答

python-3.x 打印标签beautifulsoup后剩余标签的内容

1条答案

相关问题

热门标签

最新问答