如何使用Requests将URLError / HTMLError传递到csv文件?

cu6pst1q  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(88)

新手在这里
在我下面的代码中,我正在抓取某些HTML数据的URL列表,然后将其写入csv文件,每行包含每个URL的数据。代码还检查URL是否有URLError或HTMLError,在这种情况下,它将其打印到控制台。但是,我不想将“URL错误”或“HTML错误”打印到控制台,而是将其传递到csv文件,在第二列标记为“错误报告”(其中“ErrorReport”在“row =”行中)。这样,我就可以查看csv文件,看看哪些URL导致了错误。

import requests
import bs4
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv

def getTitle(soup):
    return soup.find('title').text.strip()

urlList = ["https://stackoverflow.com"]

with open('output.csv', 'w', newline='')  as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Title', Error Report])
    
    for url in urlList:
        try:
            html = urlopen(url)
        except HTTPError:
            print("HTML Error")
        except URLError:
            print("URL Error")
        else:
            soup = bs4.BeautifulSoup(html.read(), 'html.parser')
            row = [ErrorReport, getTitle(soup)]
            print(row)

            csv_output.writerow(row)
h6my8fg2

h6my8fg21#

我会稍微调整一下你的剧本:首先将所有数据收集到一个结构中,如列表,然后从这个列表创建一个DataFrame,然后您可以将DataFrame保存为CSV文件:

import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError

def getTitle(soup):
    return soup.find('title').text.strip()

urlList = ["https://stackoverflow.com", "http://nonexistentwebpage.example"]

all_data = []
for url in urlList:
    try:
        r = urlopen(url)
        soup = BeautifulSoup(r.read(), 'html.parser')
        all_data.append([url, getTitle(soup), None])
    except HTTPError:
        all_data.append([url, None, 'HTTP Error'])
    except URLError:
        all_data.append([url, None, 'URL Error'])

df = pd.DataFrame(all_data, columns=['URL', 'Title', 'Error Report'])
print(df)
df.to_csv('data.csv', index=False)

图纸:

URL                                                            Title Error Report
0          https://stackoverflow.com  Stack Overflow - Where Developers Learn, Share, & Build Careers         None
1  http://nonexistentwebpage.example                                                             None    URL Error

并保存data.csv

URL,Title,Error Report
https://stackoverflow.com,"Stack Overflow - Where Developers Learn, Share, & Build Careers",
http://nonexistentwebpage.example,,URL Error

相关问题