python-3.x csv.writer未将整个输出写入CSV文件

ejk8hzay  于 2022-12-05  发布在  Python
关注(0)|答案(2)|浏览(193)

我试图从www.example.com将艺术家的Spotify流媒体排名刮Kworb.net到一个CSV文件中,我几乎成功了,除了我遇到了一个奇怪的问题。
下面的代码成功地将列出的所有10,000位艺术家抓取到控制台中:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    print(beautified_value)

    if len(beautified_value) == 0:
        continue

    rows.append(beautified_value)

当我使用以下代码将输出保存到CSV文件时,出现了这个问题:

with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)
    writer.writerow(headers)
    writer.writerows(rows)

不管什么原因,只有738位艺术家被保存到文件中。有人知道是什么原因导致的吗?
非常感谢您的帮助!

qjp7pelc

qjp7pelc1#

作为一种替代方法,您可能希望下次使用pandas来简化您的工作。
具体操作如下:

import requests
import pandas as pd

source = requests.get("https://kworb.net/spotify/artists.html")
df = pd.concat(pd.read_html(source.text, flavor="bs4"))
df.to_csv("artists.csv", index=False)

这将输出一个包含10,000艺术家的.csv文件。

8tntrjer

8tntrjer2#

您的代码存在的问题是,您使用print语句在控制台上显示数据,但这并不包括在要写入CSV文件的行列表中。相反,您需要在将数据写入CSV文件之前将其追加到行列表中。
以下是修改代码以解决此问题的方法:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
value = row.find_all('td')
beautified_value = [dp.text.strip() for dp in value]
# Append the data to the rows list
rows.append(beautified_value)

将数据写入CSV文件

with open('artist_rankings.csv', 'w', newline="") as output:
writer = csv.writer(output)
writer.writerow(headers)
writer.writerows(rows)

在这个修改过的代码中,数据首先被附加到行列表,然后被写入CSV文件。这将确保所有数据都被保存到文件中,而不仅仅是前738行。
请注意,您可能还需要在代码中添加一些错误处理功能,以防对URL的请求失败,或者网页的HTML不是预期的格式。这有助于防止代码在遇到意外数据时崩溃。您可以通过在代码中添加try-except块来实现这一点,如下所示:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"

try:
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

if table is None:
    raise Exception("Could not find table with id 'spotifyartistindex'")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    # Append the data to the rows list
    rows.append(beautified_value)

# Write the data to the CSV file
with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)

相关问题