python-3.x 如何修复TypeError：类型“NoneType”的参数在这种情况下不可迭代？

bprjcwpo 于 2023-03-24 发布在 Python

关注(0)|答案(4)|浏览(121)

我正在写一个脚本来遍历一个根url列表并找到电子邮件地址。有时它没有返回任何结果。我已经在代码中说明了这一点，并按照SO上这个问题的答案上的说明来修复它，但似乎无法弄清楚。
首先，我将导入一个URL列表：

url_list_updated= 
    ['http://www.gfcadvice.com/',
     'https://trillionfinancial.com.sg/about-us/',
     'https://www.gen.com.sg/',
     'https://www.aam-advisory.com/',
     'https://www.proinvest.com.sg/',
     'http://www.gilbertkoh.com/',
     'https://dollarbureau.com/',
     'http://www.greenfieldadvisory.com/',
     'https://enpointefinancial.com/',
     'https://www.ippfa.com/']

然后，我使用BeautifulSoup查找'mailto:'并返回这些结果的列表：

for url in url_list_updated:
    response = requests.get(url)
    html_content = response.text
    
    soup = BeautifulSoup(html_content, 'html.parser')
    
    email_addresses = []
    for link in soup.find_all('a'):
#         if 'mailto:' != None and 'mailto:' in link.get('href'):
#         if 'mailto:' != '' and 'mailto:' in link.get('href'):
#         if 'mailto:' in link.get('href') != None:
        if 'mailto:' in link.get('href') != '':
            email_addresses.append(link.get('href').replace('mailto:', ''))
            print(email_addresses)
        else:
            pass

我知道有些结果会是空的，因为不是每个网站都有'mailto:'信息可见，所以我在SO上遵循了NoneType的多个解决方案（我已经注解出来供参考）
回溯总是给我同样的结果，即使我考虑了丢失的结果。

7     email_addresses = []
      8     for link in soup.find_all('a'):
      9 #         if 'mailto:' != None and 'mailto:' in link.get('href'):
     10 #         if 'mailto:' != '' and 'mailto:' in link.get('href'):
     11 #         if 'mailto:' in link.get('href') != None:
---> 12         if 'mailto:' in link.get('href') != '':
     13             email_addresses.append(link.get('href').replace('mailto:', ''))
     14             print(email_addresses)

TypeError: argument of type 'NoneType' is not iterable

我应该怎么做？

python-3.x

来源：https://stackoverflow.com/questions/75830983/how-to-fix-typeerror-argument-of-type-nonetype-is-not-iterable-in-this-circum

4条答案

按热度按时间

0pizxfdo1#

问题是你检查它的方式。你试图检查一个字符串是否在某个东西中，并使用它来检查它是否不同于''。第一个操作总是会返回bool（或在这种情况下是错误），因此无法收集电子邮件。

href = link.get('href')
if href is not None and 'mailto:' in href:
    email_addresses.append(href.replace('mailto:', ''))

赞(0）回复(0）举报 2023-03-24

35g0bw712#

您还可以尝试直接使用mailto:选择<a>，更具体地通过css selctor选择

soup.select('a[href*="mailto:"]')

如果ResultSet中没有元素，则不会迭代。

示例

from bs4 import BeautifulSoup

html = '''
<a href="mailto:someone@example.com">Send email</a>
'''
soup = BeautifulSoup(html)

[
    a.get('href').split(':')[-1]
    for a in soup.select('a[href*="mailto:"]')
]

赞(0）回复(0）举报 2023-03-24

axr492tv3#

if语句所做的是if ('mailto:' in link.get('href')) != ''，如果它是None，则在检查周围放置显式括号是没有帮助的。

if link.get('href') is not None and 'mailto:' in link.get('href'):
    email_addresses.append(link.get('href').replace('mailto:', ''))
    print(email_addresses)

赞(0）回复(0）举报 2023-03-24

epggiuax4#

link对象中并不总是有数据。可以按如下方式处理此异常：

from bs4 import BeautifulSoup
import requests

def main():
    url_list_updated= ['http://www.gfcadvice.com/',
     'https://trillionfinancial.com.sg/about-us/',
     'https://www.gen.com.sg/',
     'https://www.aam-advisory.com/',
     'https://www.proinvest.com.sg/',
     'http://www.gilbertkoh.com/',
     'https://dollarbureau.com/',
     'http://www.greenfieldadvisory.com/',
     'https://enpointefinancial.com/',
     'https://www.ippfa.com/']
    for url in url_list_updated:
        response = requests.get(url)
        html_content = response.text
        
        soup = BeautifulSoup(html_content, 'html.parser')
        
        email_addresses = []
        for link in soup.find_all('a'):
    #         if 'mailto:' != None and 'mailto:' in link.get('href'):
    #         if 'mailto:' != '' and 'mailto:' in link.get('href'):
    #         if 'mailto:' in link.get('href') != None:
            try:
                if 'mailto:' in link.get('href') != '':
                    email_addresses.append(link.get('href').replace('mailto:', ''))
                    print(email_addresses)
                else:
                    pass
            except TypeError:
                print ("No email addresses")

if __name__ == '__main__':
    main()

赞(0）回复(0）举报 2023-03-24

我来回答

python-3.x 如何修复TypeError：类型“NoneType”的参数在这种情况下不可迭代？

4条答案

示例

相关问题

热门标签

最新问答