html 使用Beautifulsoup从世界Map中抓取数据并存储到csv文件中

tkqqtvp1 于 2023-04-04 发布在其他

关注(0)|答案(1)|浏览(143)

尝试抓取网站https://www.startupblink.com/startups的数据-以获取所有创业公司：嗯，我认为这是一个很好的机会，这样做与Python和美丽的汤。

技术上，我们可以使用Python和Beautiful Soup从网站https://www.startupblink.com/startups中抓取数据
什么是需要的：..这里一些溢出的步骤：

首先，我们需要使用Python中的requests库向网站发送GET请求。然后，我们使用Beautiful Soup解析响应的HTML内容。
我们需要使用Beautiful Soup的find或find_all方法找到包含我们感兴趣的启动数据的HTML元素。
之后，我们尝试使用Beautiful Soup的string或get方法从HTML元素中提取相关信息。最后，我们以我们选择的格式存储数据，例如CSV文件或数据库（注意-如果我们使用pandas，它会更容易一点）
这里有一些初步的想法来开始：

import requests
from bs4 import BeautifulSoup
import csv

# Send an HTTP request to the website's URL and retrieve the HTML content
url = 'https://www.startupblink.com/startups'
response = requests.get(url)

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the startup listings on the page
startup_listings = soup.find_all('div', {'class': 'startup-list-item'})

# Create a CSV file to store the extracted data
with open('startup_data.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Name', 'Description', 'Location', 'Website'])

    # Loop through each startup listing and extract the relevant information
    for startup in startup_listings:
        name = startup.find('a', {'class': 'startup-link'}).text.strip()
        description = startup.find('div', {'class': 'startup-description'}).text.strip()
        location = startup.find('div', {'class': 'startup-location'}).text.strip()
        website = startup.find('a', {'class': 'startup-link'})['href']

        # Write the extracted data to the CSV file
        writer.writerow([name, description, location, website])

在这一点上，我认为我必须返工的代码-我得到只有一个微小的csv文件与35字节。
我将不得不运行更多的测试-以确保我得到正确的方法

Html

来源：https://stackoverflow.com/questions/75877916/using-beautifulsoup-to-scrape-the-data-from-a-worldmap-and-store-this-into-a-csv

1条答案

按热度按时间

edqdpe6u1#

不要使用BeautifulSoup来动态交付内容，而是使用数据来源的API端点。迭代一系列页面并合并 Dataframe ，最终将其转换为任何格式：

import requests
import pandas as pd 

pd.concat(
    [
        pd.DataFrame(
            requests.get(f'https://www.startupblink.com/api/entities?entity=startups&page={page}&sortBy=rank&order=desc&leaderType=1').json()['page']
        )
        for page in range(0,5)
    ]
).to_csv('data.csv', index=False)

赞(0）回复(0）举报 2023-04-04

我来回答

html 使用Beautifulsoup从世界Map中抓取数据并存储到csv文件中

1条答案

相关问题

热门标签

最新问答