Python3 AttributeError:“NoneType”对象没有属性“find_all”

50few1ms  于 2023-04-13  发布在  Python
关注(0)|答案(3)|浏览(180)

我想使用Python 3使用requestsBeautifulSoup模块进行网页抓取,但我遇到了错误。我的代码是否有问题?我如何修复错误?

import requests
from bs4 import BeautifulSoup

url = 'https://otakudesu.lol/genre-list/'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

genres_div = soup.find('div', class_='genres')
genre_links = genres_div.find_all('a')

path = []
text = []

for link in genre_links:
    path.append(link['href'])
    text.append(link.text)

print(path)
print(text)

错误:

genre_links = genres_div.find_all('a')
                  ^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'find_all'

如何修复错误?

nbnkbykc

nbnkbykc1#

你在这个地方有一个错误genres_div = soup.find('div', class_='genres')。你正在寻找'div'标签,但你应该寻找'ul'标签。下面是工作代码:

import requests
from bs4 import BeautifulSoup

url = 'https://otakudesu.lol/genre-list/'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

genres_div = soup.find('ul', class_='genres')
genre_links = genres_div.find_all('a')

path = []
text = []

for link in genre_links:
    path.append(link['href'])
    text.append(link.text)

print(path)
print(text)

结果:

['/genres/action/', '/genres/adventure/', '/genres/comedy/', '/genres/demons/', '/genres/drama/', '/genres/ecchi/', '/genres/fantasy/', '/genres/game/', '/genres/harem/', '/genres/historical/', '/genres/horror/', '/genres/josei/', '/genres/magic/', '/genres/martial-arts/', '/genres/mecha/', '/genres/military/', '/genres/music/', '/genres/mystery/', '/genres/psychological/', '/genres/parody/', '/genres/police/', '/genres/romance/', '/genres/samurai/', '/genres/school/', '/genres/sci-fi/', '/genres/seinen/', '/genres/shoujo/', '/genres/shoujo-ai/', '/genres/shounen/', '/genres/slice-of-life/', '/genres/sports/', '/genres/space/', '/genres/super-power/', '/genres/supernatural/', '/genres/thriller/', '/genres/vampire/']
['Action', 'Adventure', 'Comedy', 'Demons', 'Drama', 'Ecchi', 'Fantasy', 'Game', 'Harem', 'Historical', 'Horror', 'Josei', 'Magic', 'Martial Arts', 'Mecha', 'Military', 'Music', 'Mystery', 'Psychological', 'Parody', 'Police', 'Romance', 'Samurai', 'School', 'Sci-Fi', 'Seinen', 'Shoujo', 'Shoujo Ai', 'Shounen', 'Slice of Life', 'Sports', 'Space', 'Super Power', 'Supernatural', 'Thriller', 'Vampire']
xn1cxnb4

xn1cxnb42#

【概要:我觉得你要的genres_div应该是**ul**标签(不是 * div *),带 * class="genres" *。】

错误:

genre_links = genres_div.find_all('a')
                  ^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'find_all'

这表明genres_div返回None;为了避免引发错误,您可以检查它并将genre_links设置为默认的空列表

genre_links = [] if genres_div is None else genres_div.find_all('a') ## OR
# genre_links = genres_div.find_all('a') if genres_div else []

如果genres_div在某些时候只是null,并且在这种情况下您只需要空的pathtext列表,那么这很好。
但是,如果您想调查 * 为什么 * genres_div没有返回任何内容,很可能是因为您试图使用

genres_div = soup.find('div', class_='genres')

但我找不到任何divsource htmlclass_='genres'
我能找到的唯一一个带有这个类的标签(view screenshot)是一个ul标签,它确实包含了一个链接列表;所以你可以把上面的行改为genres_div = soup.find('ul', class_='genres'),但是我发现链接find...调用通常是有风险的,通常更喜欢使用.select和CSS选择器(由SoupSieve处理),比如:

genre_links = soup.select('ul.genres a[href]') ## OR
# genre_links = soup.select('.genres a[href]') 
## '.genres a[href]' <-- any a tags inside any tag with a "genres" class

如果这些 * 是 * 您想要的链接,那么您也可以完全绕过.genres,通过title属性搜索它们:

# genre_links = soup.select(f'a[href][title^="View all posts in "]') ## OR 

title_check = lambda t: t and t.startswith("View all posts in ")
genre_links = soup.find_all('a', href=True, title=title_check)

顺便说一句,你也可以用list comprehension得到pathtext列表,而不是在循环中追加。

  • 或者使用两个单独的列表解析:
genre_links = soup.select(f'.genres a[href]')
path, text = [a['href'] for a in genre_links], [a.text for a in genre_links]
# path_text_pairs = list(zip(path, text))
path_text_pairs = [(a['href'], a.text) for a in soup.select('.genres a[href]')]
path, text = [list(l) for l in zip(*path_text_pairs)]

无论使用哪种方法,print(f'{path_text_pairs = }\n\n{path = }\n{text = }')都应该打印:

path_text_pairs = [('/genres/action/', 'Action'), ('/genres/adventure/', 'Adventure'), ('/genres/comedy/', 'Comedy'), ('/genres/demons/', 'Demons'), ('/genres/drama/', 'Drama'), ('/genres/ecchi/', 'Ecchi'), ('/genres/fantasy/', 'Fantasy'), ('/genres/game/', 'Game'), ('/genres/harem/', 'Harem'), ('/genres/historical/', 'Historical'), ('/genres/horror/', 'Horror'), ('/genres/josei/', 'Josei'), ('/genres/magic/', 'Magic'), ('/genres/martial-arts/', 'Martial Arts'), ('/genres/mecha/', 'Mecha'), ('/genres/military/', 'Military'), ('/genres/music/', 'Music'), ('/genres/mystery/', 'Mystery'), ('/genres/psychological/', 'Psychological'), ('/genres/parody/', 'Parody'), ('/genres/police/', 'Police'), ('/genres/romance/', 'Romance'), ('/genres/samurai/', 'Samurai'), ('/genres/school/', 'School'), ('/genres/sci-fi/', 'Sci-Fi'), ('/genres/seinen/', 'Seinen'), ('/genres/shoujo/', 'Shoujo'), ('/genres/shoujo-ai/', 'Shoujo Ai'), ('/genres/shounen/', 'Shounen'), ('/genres/slice-of-life/', 'Slice of Life'), ('/genres/sports/', 'Sports'), ('/genres/space/', 'Space'), ('/genres/super-power/', 'Super Power'), ('/genres/supernatural/', 'Supernatural'), ('/genres/thriller/', 'Thriller'), ('/genres/vampire/', 'Vampire')]

path = ['/genres/action/', '/genres/adventure/', '/genres/comedy/', '/genres/demons/', '/genres/drama/', '/genres/ecchi/', '/genres/fantasy/', '/genres/game/', '/genres/harem/', '/genres/historical/', '/genres/horror/', '/genres/josei/', '/genres/magic/', '/genres/martial-arts/', '/genres/mecha/', '/genres/military/', '/genres/music/', '/genres/mystery/', '/genres/psychological/', '/genres/parody/', '/genres/police/', '/genres/romance/', '/genres/samurai/', '/genres/school/', '/genres/sci-fi/', '/genres/seinen/', '/genres/shoujo/', '/genres/shoujo-ai/', '/genres/shounen/', '/genres/slice-of-life/', '/genres/sports/', '/genres/space/', '/genres/super-power/', '/genres/supernatural/', '/genres/thriller/', '/genres/vampire/']
text = ['Action', 'Adventure', 'Comedy', 'Demons', 'Drama', 'Ecchi', 'Fantasy', 'Game', 'Harem', 'Historical', 'Horror', 'Josei', 'Magic', 'Martial Arts', 'Mecha', 'Military', 'Music', 'Mystery', 'Psychological', 'Parody', 'Police', 'Romance', 'Samurai', 'School', 'Sci-Fi', 'Seinen', 'Shoujo', 'Shoujo Ai', 'Shounen', 'Slice of Life', 'Sports', 'Space', 'Super Power', 'Supernatural', 'Thriller', 'Vampire']

完整工作代码的建议版本:

import requests
from bs4 import BeautifulSoup

url = 'https://otakudesu.lol/genre-list/'
soup = BeautifulSoup((r:=requests.get(url)).content, 'html.parser')
print(f'<{r.status_code} {r.reason}> from {r.url}\n') ## for debugging

genre_links = soup.select(f'.genres a[href]')
paths, texts = [a['href'] for a in genre_links], [a.text for a in genre_links]

if not genre_links: ## for debugging [ if no results ]
    dump_fp = 'x.html'
    with open(dump_fp, 'wb') as f: f.write(soup.prettify('utf-8'))
    print(f'No genre links found - saved html to {dump_fp!r}')
else: print(f'{paths = }\n{texts = }') ## print results

这样,如果没有结果,您可以打开dump_fp文件(并检查with JavaScript disabled)以检查您的代码正在使用的HTML源[有时是not the same as what you can inspect on your browser,即使您的请求得到了OK响应]。

qpgpyjmq

qpgpyjmq3#

先生,你是想得到class=genre吧?你代码的问题是genre的类继承给ul而不是div
所以我改变了
genres_div = soup.find('div', 'genres')

genres_div = soup.find('ul', 'genres')

如果还有什么需要我会很乐意帮忙的:)

  • 希望我的回答能对你有用 *
import requests
from bs4 import BeautifulSoup

url = 'https://otakudesu.lol/genre-list/'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

genres_div = soup.find('ul', 'genres')
genre_links = genres_div.find_all('a')

path = []
text = []

for link in genre_links:
    path.append(link['href'])
    text.append(link.text)

print(path)
print(text)

相关问题