Python网页搜罗Json文件

vohkndzv 于 2023-02-01 发布在 Python

关注(0)|答案(2)|浏览(182)

我尝试提取以下网站上每个学校的所有数据：
https://schulfinder.kultus-bw.de/
我的代码是：

import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import get
from selenium.webdriver.common.by import By
import json

url = "https://schulfinder.kultus-bw.de/api/school?uuid=81af189c-7bc0-44a3-8c9f-73e6d6e50fdb&_=1675072758525"

payload = {}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

输出如下：

{
  "outpost_number": "0",
  "name": "Gartenschule Grundschule Ebnat",
  "street": "Abt-Angehrn-Str.",
  "house_number": "5",
  "postcode": "73432",
  "city": "Aalen",
  "phone": "+49736796700",
  "fax": "+497367967016",
  "email": "poststelle@04125313.schule.bwl.de",
  "website": null,
  "tablet_tranche": null,
  "tablet_platform": null,
  "tablet_branches": null,
  "tablet_trades": null,
  "lat": 48.80094,
  "lng": 10.18761,
  "official": 0,
  "branches": [
    {
      "branch_id": 12110,
      "acronym": "GS",
      "description_long": "Grundschule"
    }
  ],
  "trades": []
}

我通过Chrome Inspector Network获得了代码，并请求每个 Postman 的URL。我的问题是，我只得到了一所学校的信息，我不知道如何请求所有的学校。

JSON

来源：https://stackoverflow.com/questions/75283604/python-web-scraping-json-file

2条答案

按热度按时间

kcwpcxri1#

只需使用正确的端点：

https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084

这将为您提供list个学校，可用于使用uuid通过您的端点从问题（https://schulfinder.kultus-bw.de/api/school?...）请求更多数据。

[{"uuid":"50de01a4-503d-44d1-af4b-a6031a022b85","outpost_number":"0","name":"Grundschule Aach","city":"Aach","lat":47.84399,"lng":8.85067,"official":0,"marker_class":"marker green","marker_label":"G","website":null},{"uuid":"8818037f-9aed-4860-b42e-8a49b1403c02","outpost_number":"0","name":"Braunenbergschule Grundschule Wasseralfingen","city":"Aalen","lat":48.8612,"lng":10.11191,"official":0,"marker_class":"marker green","marker_label":"G","website":null},...]

请注意，结果限制为500，您必须使用和过滤器并合并结果才能获得所有结果。*：

这样的限制是正确的。梅尔有500个特雷弗没有被使用。请你原谅我的做法，因为你有一个B。

赞(0）回复(0）举报 2023-02-01

kr98yfug2#

除了the answer already given之外。
要获取API的GET请求的所有搜索条件，可以使用已经导入的BeautifulSoup解析主页内容：

from bs4 import BeautifulSoup
import requests

search_page_url = "https://schulfinder.kultus-bw.de"
page_contents = requests.request("GET", search_page_url).text

parsed_html = BeautifulSoup(page_contents, features="html.parser")
input_elements = parsed_html.body.find_all('input')
search_params = list(map(lambda x: (x.get('name'), x.get('type'), x.get('value')), input_elements))

search_params包含名称、类型和值的元组，它应该可以给予您深入了解参数及其可能的值。

赞(0）回复(0）举报 2023-02-01

我来回答

Python网页搜罗Json文件

2条答案

相关问题

热门标签

最新问答