python-3.x csv.writer未将整个输出写入CSV文件

ejk8hzay 于 2022-12-05 发布在 Python

关注(0)|答案(2)|浏览(193)

我试图从www.example.com将艺术家的Spotify流媒体排名刮Kworb.net到一个CSV文件中，我几乎成功了，除了我遇到了一个奇怪的问题。
下面的代码成功地将列出的所有10，000位艺术家抓取到控制台中：

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    print(beautified_value)

    if len(beautified_value) == 0:
        continue

    rows.append(beautified_value)

当我使用以下代码将输出保存到CSV文件时，出现了这个问题：

with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)
    writer.writerow(headers)
    writer.writerows(rows)

不管什么原因，只有738位艺术家被保存到文件中。有人知道是什么原因导致的吗？
非常感谢您的帮助！

python-3.x

来源：https://stackoverflow.com/questions/74680982/csv-writer-not-writing-entire-output-to-csv-file

2条答案

按热度按时间

qjp7pelc1#

作为一种替代方法，您可能希望下次使用pandas来简化您的工作。
具体操作如下：

import requests
import pandas as pd

source = requests.get("https://kworb.net/spotify/artists.html")
df = pd.concat(pd.read_html(source.text, flavor="bs4"))
df.to_csv("artists.csv", index=False)

这将输出一个包含10,000艺术家的.csv文件。

赞(0）回复(0）举报 2022-12-05

8tntrjer2#

您的代码存在的问题是，您使用print语句在控制台上显示数据，但这并不包括在要写入CSV文件的行列表中。相反，您需要在将数据写入CSV文件之前将其追加到行列表中。
以下是修改代码以解决此问题的方法：

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
value = row.find_all('td')
beautified_value = [dp.text.strip() for dp in value]
# Append the data to the rows list
rows.append(beautified_value)

将数据写入CSV文件

with open('artist_rankings.csv', 'w', newline="") as output:
writer = csv.writer(output)
writer.writerow(headers)
writer.writerows(rows)

在这个修改过的代码中，数据首先被附加到行列表，然后被写入CSV文件。这将确保所有数据都被保存到文件中，而不仅仅是前738行。
请注意，您可能还需要在代码中添加一些错误处理功能，以防对URL的请求失败，或者网页的HTML不是预期的格式。这有助于防止代码在遇到意外数据时崩溃。您可以通过在代码中添加try-except块来实现这一点，如下所示：

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"

try:
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

if table is None:
    raise Exception("Could not find table with id 'spotifyartistindex'")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    # Append the data to the rows list
    rows.append(beautified_value)

# Write the data to the CSV file
with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)

赞(0）回复(0）举报 2022-12-05

我来回答

python-3.x csv.writer未将整个输出写入CSV文件

2条答案

相关问题

热门标签

最新问答