html 为什么我的代码给我一个AttributeError？

evrscar2 于 2023-04-27 发布在其他

关注(0)|答案(1)|浏览(144)

我试图通过几个层次的html来检索与立法相关的链接。然而，一旦我到达链接的第二层，而不是检索与单个法案相关的链接列表，我得到了错误：
发生异常：AttributeError 'NoneType' object has no attribute 'startswith' File“C：\Users\Justin\Desktop\ilgascrapetest1.py”，line 14，in if href.startswith（'/ legislation/BillStatus.asp？'）：^^^^^^^^^^^^“NoneType”对象没有属性“startswith”
这是到目前为止的代码：

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section
house_bills = soup.find('a', {"name": "h_bills"}).parent

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if href.startswith('/legislation/BillStatus.asp?'):
        bill_url = url + href
        bill_response = requests.get(bill_url)
        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

我能够从第一页的Html中的"House Bills" table检索链接列表并迭代，但在give a list of links to individual bills的下一个级别中，我得到的是错误，而不是从HB 0001到HB 4042的账单链接。为什么我得到这个错误

Html

来源：https://stackoverflow.com/questions/76077843/why-is-my-code-giving-me-an-attributeerror

1条答案

按热度按时间

wvmv3b1j1#

这个站点上有多个<a>元素没有href，所以在这种情况下link.get('href')将返回None。你不能在None上调用startswith()，所以你必须添加一个检查href是否是None：

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section
house_bills = soup.find('a', {"name": "h_bills"}).parent

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if not href:
        continue  # Ignore links without href
    if href.startswith('/legislation/BillStatus.asp?'):
        bill_url = url + href
        bill_response = requests.get(bill_url)
        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

另外，您混淆了URL：首先，你需要打开“grplist.asp”，然后链接以“BillStatus.asp”开头。要只访问房屋账单部分的链接，你需要选择a后面的div，名称为h_bills，而不是它的父节点。我还修改了你的代码，这样bill_url就不再是从包括“/default.asp”在内的完整URL构建的了。

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section (next div after a with name "h_bills")
house_bills = soup.find('a', {"name": "h_bills"}).find_next_sibling("div")

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if not href:
        continue  # Ignore links without href

    if href.startswith('grplist.asp?'):
        bill_url = "https://www.ilga.gov/legislation/" + href

        bill_response = requests.get(bill_url)
        if bill_response.status_code != 200:  # Prevent crash when response is not valid
            continue

        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

赞(0）回复(0）举报 2023-04-27

我来回答

html 为什么我的代码给我一个AttributeError？

1条答案

相关问题

热门标签

最新问答