python 使用BeautifulSoup通过迭代检索属性值

im9ewurl 于 2023-01-29 发布在 Python

关注(0)|答案(2)|浏览(145)

我刮一个html保存在一个文件与以下代码：

from bs4 import BeautifulSoup as bs

path_xml = r"..."

content = []

with open(path_xml, "r") as file:
    content = file.readlines()

content = "".join(content)
bs_content = bs(content, "html.parser")

bilder = bs_content.find_all("bilder")

def get_str_bild(match):
    test = match.findChildren("b")

    for x in range(len(test)): # here is the problem (not giving me all elements in test)
 
        return test[x].get("d")

for b in bilder:
    if b.b: 
        print(get_str_bild(b))

输出：

L3357U00_002120.jpg
L3357U00_002140.jpg
L3357U00_002160.jpg

基本上，在xml文件中有3个位置我有节点"* bilder *"的子节点。每个块看起来像这样：

<Bilder>
    <B Nr="1" D="L3357U00_002120.jpg"/>
    <B Nr="2" D="L3357U00_002120.jpg"/>
    <B Nr="3" D="L3357U00_002120.jpg"/>
    <B Nr="4" D="L3357U00_002120.jpg"/>
    <B Nr="9" D="L3357U00_002120.jpg"/>
    <B Nr="1" D="L3357U00_002130.jpg"/>
    <B Nr="2" D="L3357U00_002130.jpg"/>
    <B Nr="3" D="L3357U00_002130.jpg"/>
    <B Nr="4" D="L3357U00_002130.jpg"/>
    <B Nr="9" D="L3357U00_002130.jpg"/>
</Bilder>

目前它只返回每个块的第一张图片，我想返回所有的图片。
我到底做错了什么？

python

来源：https://stackoverflow.com/questions/75268114/retrieving-attribute-values-through-iterations-using-beautifulsoup

2条答案

按热度按时间

bvuwiixz1#

你错过了bilders的循环，你可以删除你的函数并简化你的代码如下：

pic_1 = "L3357U00_002120.jpg"

bs_content = bs(content, "html.parser")
for i, builder in enumerate(bs_content.find_all("bilder")):
    print(f'builder {i}')
    for b in bilder.find_all('b'):
        if b['nr'] == pic_1:
            print(b['d'])
            #break

赞(0）回复(0）举报 2023-01-29

uyhoqukh2#

您需要修复get_str_bild(match)函数。它当前返回第一个d属性。
将您的函数替换为：

def get_str_bild(match):
    test = match.find_all("b")
    
    elements = []
    for x in range(len(test)):
        elements.append(test[x].get("d"))

    return elements

赞(0）回复(0）举报 2023-01-29

我来回答

python 使用BeautifulSoup通过迭代检索属性值

2条答案

相关问题

热门标签

最新问答