我有一个URL列表,我想转换并保存到本地驱动器上的CSV。我还想为文件名获取URL的子字符串。这是我目前拥有的代码,但它只将第一个URL数据写入2个单独的文件。
import csv
import requests
from bs4 import BeautifulSoup
link =
['https://www.health.ny.gov/statistics/sparcs/reports/audit/Emergency_Department_19.html',
'https://www.health.ny.gov/statistics/sparcs/reports/audit/Emergency_Department_20.html']
def get_data(link):
for url in link:
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("table.table tr"):
td = [item.get_text(strip=True) for item in items.select("th,td")]
writer.writerow(td)
if __name__ == '__main__':
for f in link:
f2 = f.split('audit/')[-1].split('.html')[0]
with open(f2 + '.csv',"w",newline="") as infile:
writer = csv.writer(infile)
get_data(link)
1条答案
按热度按时间1qczuiv01#
你不需要在
get_data()
中再次循环link
,你只需要在main
循环中把url
发送到get_data
: