正如标题中所示,我想知道如何才能获得特定的div,而在同一层次结构中有许多其他div具有相同的:父、子和类名。
**网页抓取器:**BeautifulSoup
示例:
<div class="main>
<div class="parent">
<div class="css-1wi2w6s enb64yk4"/>
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4"/>
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4"/> //div i want to get
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4"/>
</div>
</div>
我的目标是从顶部得到第三个子div。
我尝试的是:
rooms = doc.select_one('.css-1wi2w6s.enb64yk4 > div:nth-of-type(3)')
但它返回了一个错误。
替代解决方案:
def get_data(url):
if url.startswith('link'):
result = requests.get(url)
doc = BeautifulSoup(result.text, 'html.parser')
rooms = doc.find('div', {'class': 'css-1wi2w6s enb64yk4'}, string=True)
z = rooms.find_next()
x = z.find_next()
c = x.find_next()
v = c.find_next()
b = v.find_next()
n = b.find_next()
m = n.find_next()
l = m.find_next()
k = l.find_next()
j = k.find_next()
h = j.find_next()
for i in h:
return i.string
else:
# rooms = doc.find_all('p', {'class':'css-b5m1rv er34gjf0'})
# return rooms
return "link"
在您的评论之后:
htmld_doc = """
<div class="main">
<div class="parent">
<div class="css-1wi2w6s enb64yk4">1</div>
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4">2</div>
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4">3</div>
</div>
<div class="parent">
<div class="css-1wi2w6s enb64yk4">4</div>
</div>
</div>
"""
doc = BeautifulSoup(htmld_doc, 'html.parser')
rooms = doc.find('div', {'class': 'parent:nth-of-type(3) .css-1wi2w6s.enb64yk4'}, string=True)
print(rooms)
for i in rooms:
print(i.string)
类型错误:'NoneType'对象不可迭代
1条答案
按热度按时间rpppsulh1#
代码
.css-1wi2w6s.enb64yk4 > div:nth-of-type(3)
选择了类为“css-1 wi 2 w 6s”和“enb 64 yk 4”的元素的第三个直接子元素。但似乎您希望选择具有这些类的元素,这些类是具有类“parent”的第三个元素的子元素。而不是:
.css-1wi2w6s.enb64yk4 > div:nth-of-type(3)
尝试:
.parent:nth-of-type(3).css-1wi2w6s.enb64yk4
(我还重新格式化了你的HTML,它似乎是格式错误的。