Beautiful Soup 的强项是文档树的搜索,但同时也可以方便的修改文档树。
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
tag = soup.b
tag.name = "blockquote"
tag['class'] = 'verybold'
tag['id'] = 1
print(tag)
输出:
<blockquote class="verybold" id="1">Extremely bold</blockquote>
del tag['class']
del tag['id']
print(tag)
输出:
<blockquote>Extremely bold</blockquote>
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
tag = soup.a
tag.string = "New link text."
print(tag)
输出:
<a href="http://example.com/">New link text.</a>
如果当前的 tag 包含了其它 tag,那么给它的 string 属性赋值会覆盖掉原有的所有内容包括子 tag。
soup = BeautifulSoup("<a>Foo</a>")
soup.a.append("Bar")
print(soup)
print(soup.a.contents)
输出:
<html><head></head><body><a>FooBar</a></body></html>
['Foo','Bar']
soup = BeautifulSoup("<b></b>")
tag = soup.b
tag.append("Hello")
new_string = NavigableString(" there")
tag.append(new_string)
print(tag)
输出:
<b>Hello there.</b>
from bs4 import BeautifulSoup, Comment, NavigableString
soup = BeautifulSoup("<b></b>")
tag = soup.b
tag.append("Hello")
new_string = NavigableString(" there")
tag.append(new_string)
new_comment = soup.new_string("Nice to see you.", Comment)
tag.append(new_comment)
print(tag)
输出:
<b>Hello there<!--Nice to see you.--></b>
Beautiful Soup 4.2.1 中新增的方法:
创建一个 tag 最好的方法是调用工厂方法 BeautifulSoup.new_tag():
from bs4 import BeautifulSoup
soup = BeautifulSoup("<b></b>")
tag_original = soup.b
tag_new = soup.new_tag("a", href="http://www.example.com")
tag_original.append(tag_new)
print(tag_original)
输出:
<b><a href="http://www.example.com"></a></b>
from bs4 import BeautifulSoup
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
tag = soup.a
print(tag)
tag.insert(1, "but did not endorse")
print(tag)
print(tag.contents)
输出:
<a href="http://example.com/">I linked to <i>example.com</i></a>
<a href="http://example.com/">I linked to but did not endorse<i>example.com</i></a>
['I linked to ', 'but did not endorse', <i>example.com</i>]
from bs4 import BeautifulSoup
soup = BeautifulSoup("<b>stop</b>")
tag = soup.new_tag("a")
tag.string = "Alice"
soup.b.string.insert_before(tag)
print(soup.b)
输出:
<b><a>Alice</a>stop</b>
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
tag = soup.a
tag.clear()
print(tag)
输出:
<a href="http://example.com/"></a>
from bs4 import BeautifulSoup
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
tag_a = soup.a
tag_i = soup.i.extract()
print(tag_a)
print(tag_i)
输出:
<a href="http://example.com/">I linked to </a>
<i>example.com</i>
这个方法实际上产生了 2 个文档树:
一个是用来解析原始文档的 BeautifulSoup 对象;
另一个是被移除并且返回的 tag;
被移除并返回的 tag 可以继续调用 extract 方法;
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a
soup.i.decompose()
print(a_tag)
输出:
<a href="http://example.com/">I linked to</a>
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a
new_tag = soup.new_tag("b")
new_tag.string = "example.net"
a_tag.i.replace_with(new_tag)
print(a_tag)
输出:
<a href="http://example.com/">I linked to <b>example.net</b></a>
replace_with() 方法返回被替代的 tag 或文本节点,可以用来浏览或添加到文档树其它地方。
soup = BeautifulSoup("<p>I wish I was bold.</p>")
print(soup.p.string.wrap(soup.new_tag("b")))
print(soup.p.wrap(soup.new_tag("div")))
输出:
<b>I wish I was bold.</b>
<div><p><b>I wish I was bold.</b></p></div>
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a
a_tag.i.unwrap()
print(a_tag)
输出:
<a href="http://example.com/">I linked to example.com</a>
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/S_numb/article/details/120218188
内容来源于网络,如有侵权,请联系作者删除!