csv 尝试从新闻稿检索数据时，脚本未返回正确的输出

euoag5mw 于 2022-12-15 发布在其他

关注(0)|答案(2)|浏览(139)

我试着写一个脚本，可以从音乐商店的时事通讯中检索专辑标题和乐队名称。乐队名称和专辑标题隐藏在一个h3 & h4类中。当执行脚本时，我在csv文件中得到一个空白输出。
'

from bs4 import BeautifulSoup
import requests
import pandas as pd

# Use the requests library to fetch the HTML content of the page
url = "https://www.musicmaniarecords.be/_sys/newsl_view?n=260&sub=Tmpw6Rij5D"
response = requests.get(url)

# Use the BeautifulSoup library to parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Find all 'a' elements with the class 'row'
albums = soup.find_all('a', attrs={'class': 'row'})

# Iterate over the found elements and extract the album title and band name
album_title = []
band_name = []
for album in albums:
  album_title_element = album.find('td', attrs={'td_class': 'h3 class'})
  band_name_element = album.find('td', attrs={'td_class': 'h4 class'})
  album_title.append(album_title_element.text)
  band_name.append(band_name_element.text)

# Use the pandas library to save the extracted data to a CSV file
df = pd.DataFrame({'album_title': album_title, 'band_name': band_name})
df.to_csv('music_records.csv')

我认为错误是在attrs部分，不知道如何正确修复它。提前感谢！

csv

来源：https://stackoverflow.com/questions/74808904/script-is-not-returning-proper-output-when-trying-to-retrieve-data-from-a-newsle

2条答案

按热度按时间

inkz8wg91#

from bs4 import BeautifulSoup
import requests
import pandas as pd

# Use the requests library to fetch the HTML content of the page
url = "https://www.musicmaniarecords.be/_sys/newsl_view?n=260&sub=Tmpw6Rij5D"
response = requests.get(url)

# Use the BeautifulSoup library to parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Find all 'a' elements with the class 'row'
albums = soup.find_all('td', attrs={'class': 'block__cell'})

# Iterate over the found elements and extract the album title and band name
album_title = []
band_name = []
for album in albums:
  album_title_element = album.find('h3', attrs={'class': 'header'})
  band_name_element = album.find('h4', attrs={'class': 'header'})
  album_title.append(album_title_element.text)
  band_name.append(band_name_element.text)

# Use the pandas library to save the extracted data to a CSV file
df = pd.DataFrame({'album_title': album_title, 'band_name': band_name})
df.to_csv('music_records.csv')

感谢无名英雄的帮助！

赞(0）回复(0）举报 2022-12-15

fumotvh32#

查看您的代码，我同意错误出在attrs部分。您面临的问题是，您试图抓取的站点不包含带有'row'类的'a'元素。因此find_all返回一个空列表。有大量带有'row'类的'div'元素，也许您打算查找这些元素？
查找'td'元素并提取它们的'h3'和'h4'元素是正确的，但由于albums是一个空列表，因此没有元素可供查找。
我稍微修改了代码，直接查找“td”元素，并提取它们的“h3”和“h4”元素。

from bs4 import BeautifulSoup
import requests
import pandas as pd

# Use the requests library to fetch the HTML content of the page
url = "https://www.musicmaniarecords.be/_sys/newsl_view?n=260&sub=Tmpw6Rij5D"
response = requests.get(url)

# Use the BeautifulSoup library to parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Find all 'a' elements with the class 'row'
albums = soup.find_all('td', attrs={'class': 'block__cell'} )

# Iterate over the found elements and extract the album title and band name
album_title = []
band_name = []
for i, album in enumerate(albums):
  album_title_element = album.find('h3')
  band_name_element = album.find('h4')
  album_title.append(album_title_element.text)
  band_name.append(band_name_element.text)

# Use the pandas library to save the extracted data to a CSV file
df = pd.DataFrame({'album_title': album_title, 'band_name': band_name})
df.to_csv('music_records.csv', index=False)

我还冒昧地在代码的最后一行添加了index=False，这样每行就不会以,开头。
希望这个有用。

赞(0）回复(0）举报 2022-12-15

我来回答

csv 尝试从新闻稿检索数据时，脚本未返回正确的输出

2条答案

相关问题

热门标签

最新问答