python 如何停止获取一个空的CSV文件与剪贴

x759pob2  于 2022-12-28  发布在  Python
关注(0)|答案(1)|浏览(102)

当我运行代码,我得到我的CSV文件,它实际上是空的。
'''

import requests
from bs4 import BeautifulSoup
from csv import writer

url = 'https://www.fotocasa.es/es/alquiler/todas-las-casas/girona-provincia/todas-las-zonas/l'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('section', class_='re-CardPackAdvance')

with open('casas.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Titulo', 'Precio', 'Metros', 'Telefono']
    thewriter.writerow(header)
for list in lists:
    titulo = list.find('a', class_='re-CardPackAdvance-info-container').text.replace('\n', '')
    precio = list.find('span', class_='re-CardPrice').text.replace('\n', '')
    metros = list.find('span', class_='re-CardFeaturesWithIcons-feature-icon--surface').text.replace('\n', '')
    telefono = list.find('a', class_='re-CardContact-phone').text.replace('\n', '')
    info = [titulo, precio, metros, telefono]
    thewriter.writerow(info)

'''
我期望有所有的信息从这个网站报废,但似乎我做错了什么在某个时候

0pizxfdo

0pizxfdo1#

您没有正确解析结果soupre-CardPackAdvance类中没有section。我相应地修改了代码(查找所有类以re-CardPack开头的articles)。还请注意,您需要将for-循环移位一个缩进。但是,由于页面的结构,只有前两个条目是在获取页面时直接加载的。2所有其他条目都是在页面加载到浏览器中之后(通过javascript)获取的。3我认为你可以考虑使用页面的API来代替。

import requests
from bs4 import BeautifulSoup
from csv import writer
import re

url = 'https://www.fotocasa.es/es/alquiler/todas-las-casas/girona-provincia/todas-las-zonas/l'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

lists = soup.find_all("article", class_=re.compile("^re-CardPack"))
print(len(lists))

with open('casas.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Titulo', 'Precio', 'Metros', 'Telefono']
    thewriter.writerow(header)

    for list in lists:
        titulo = list.find('a').get('title')
        precio = list.find('span', class_='re-CardPrice').text.replace('\n', '')
        metros = list.find('span', class_='re-CardFeaturesWithIcons-feature-icon--surface').text.replace('\n', '')
        telefono = list.find('a', class_='re-CardContact-phone').text.replace('\n', '')
        info = [titulo, precio, metros, telefono]
        thewriter.writerow(info)

相关问题