python-3.x 如何修复TypeError:类型“NoneType”的参数在这种情况下不可迭代?

bprjcwpo  于 2023-03-24  发布在  Python
关注(0)|答案(4)|浏览(121)

我正在写一个脚本来遍历一个根url列表并找到电子邮件地址。有时它没有返回任何结果。我已经在代码中说明了这一点,并按照SO上这个问题的答案上的说明来修复它,但似乎无法弄清楚。
首先,我将导入一个URL列表:

url_list_updated= 
    ['http://www.gfcadvice.com/',
     'https://trillionfinancial.com.sg/about-us/',
     'https://www.gen.com.sg/',
     'https://www.aam-advisory.com/',
     'https://www.proinvest.com.sg/',
     'http://www.gilbertkoh.com/',
     'https://dollarbureau.com/',
     'http://www.greenfieldadvisory.com/',
     'https://enpointefinancial.com/',
     'https://www.ippfa.com/']

然后,我使用BeautifulSoup查找'mailto:'并返回这些结果的列表:

for url in url_list_updated:
    response = requests.get(url)
    html_content = response.text
    
    soup = BeautifulSoup(html_content, 'html.parser')
    
    email_addresses = []
    for link in soup.find_all('a'):
#         if 'mailto:' != None and 'mailto:' in link.get('href'):
#         if 'mailto:' != '' and 'mailto:' in link.get('href'):
#         if 'mailto:' in link.get('href') != None:
        if 'mailto:' in link.get('href') != '':
            email_addresses.append(link.get('href').replace('mailto:', ''))
            print(email_addresses)
        else:
            pass

我知道有些结果会是空的,因为不是每个网站都有'mailto:'信息可见,所以我在SO上遵循了NoneType的多个解决方案(我已经注解出来供参考)
回溯总是给我同样的结果,即使我考虑了丢失的结果。

7     email_addresses = []
      8     for link in soup.find_all('a'):
      9 #         if 'mailto:' != None and 'mailto:' in link.get('href'):
     10 #         if 'mailto:' != '' and 'mailto:' in link.get('href'):
     11 #         if 'mailto:' in link.get('href') != None:
---> 12         if 'mailto:' in link.get('href') != '':
     13             email_addresses.append(link.get('href').replace('mailto:', ''))
     14             print(email_addresses)

TypeError: argument of type 'NoneType' is not iterable

我应该怎么做?

0pizxfdo

0pizxfdo1#

问题是你检查它的方式。你试图检查一个字符串是否在某个东西中,并使用它来检查它是否不同于''。第一个操作总是会返回bool(或在这种情况下是错误),因此无法收集电子邮件。

href = link.get('href')
if href is not None and 'mailto:' in href:
    email_addresses.append(href.replace('mailto:', ''))
35g0bw71

35g0bw712#

您还可以尝试直接使用mailto:选择<a>,更具体地通过css selctor选择

soup.select('a[href*="mailto:"]')

如果ResultSet中没有元素,则不会迭代。

示例
from bs4 import BeautifulSoup

html = '''
<a href="mailto:someone@example.com">Send email</a>
'''
soup = BeautifulSoup(html)

[
    a.get('href').split(':')[-1]
    for a in soup.select('a[href*="mailto:"]')
]
axr492tv

axr492tv3#

if语句所做的是if ('mailto:' in link.get('href')) != '',如果它是None,则在检查周围放置显式括号是没有帮助的。

if link.get('href') is not None and 'mailto:' in link.get('href'):
    email_addresses.append(link.get('href').replace('mailto:', ''))
    print(email_addresses)
epggiuax

epggiuax4#

link对象中并不总是有数据。可以按如下方式处理此异常:

from bs4 import BeautifulSoup
import requests

def main():
    url_list_updated= ['http://www.gfcadvice.com/',
     'https://trillionfinancial.com.sg/about-us/',
     'https://www.gen.com.sg/',
     'https://www.aam-advisory.com/',
     'https://www.proinvest.com.sg/',
     'http://www.gilbertkoh.com/',
     'https://dollarbureau.com/',
     'http://www.greenfieldadvisory.com/',
     'https://enpointefinancial.com/',
     'https://www.ippfa.com/']
    for url in url_list_updated:
        response = requests.get(url)
        html_content = response.text
        
        soup = BeautifulSoup(html_content, 'html.parser')
        
        email_addresses = []
        for link in soup.find_all('a'):
    #         if 'mailto:' != None and 'mailto:' in link.get('href'):
    #         if 'mailto:' != '' and 'mailto:' in link.get('href'):
    #         if 'mailto:' in link.get('href') != None:
            try:
                if 'mailto:' in link.get('href') != '':
                    email_addresses.append(link.get('href').replace('mailto:', ''))
                    print(email_addresses)
                else:
                    pass
            except TypeError:
                print ("No email addresses")

if __name__ == '__main__':
    main()

相关问题