Python网页搜罗Json文件

vohkndzv  于 2023-02-01  发布在  Python
关注(0)|答案(2)|浏览(182)

我尝试提取以下网站上每个学校的所有数据:
https://schulfinder.kultus-bw.de/
我的代码是:

import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import get
from selenium.webdriver.common.by import By
import json

url = "https://schulfinder.kultus-bw.de/api/school?uuid=81af189c-7bc0-44a3-8c9f-73e6d6e50fdb&_=1675072758525"

payload = {}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

输出如下:

{
  "outpost_number": "0",
  "name": "Gartenschule Grundschule Ebnat",
  "street": "Abt-Angehrn-Str.",
  "house_number": "5",
  "postcode": "73432",
  "city": "Aalen",
  "phone": "+49736796700",
  "fax": "+497367967016",
  "email": "poststelle@04125313.schule.bwl.de",
  "website": null,
  "tablet_tranche": null,
  "tablet_platform": null,
  "tablet_branches": null,
  "tablet_trades": null,
  "lat": 48.80094,
  "lng": 10.18761,
  "official": 0,
  "branches": [
    {
      "branch_id": 12110,
      "acronym": "GS",
      "description_long": "Grundschule"
    }
  ],
  "trades": []
}

我通过Chrome Inspector Network获得了代码,并请求每个 Postman 的URL。我的问题是,我只得到了一所学校的信息,我不知道如何请求所有的学校。

kcwpcxri

kcwpcxri1#

只需使用正确的端点:

https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084

这将为您提供list个学校,可用于使用uuid通过您的端点从问题(https://schulfinder.kultus-bw.de/api/school?...)请求更多数据。

[{"uuid":"50de01a4-503d-44d1-af4b-a6031a022b85","outpost_number":"0","name":"Grundschule Aach","city":"Aach","lat":47.84399,"lng":8.85067,"official":0,"marker_class":"marker green","marker_label":"G","website":null},{"uuid":"8818037f-9aed-4860-b42e-8a49b1403c02","outpost_number":"0","name":"Braunenbergschule Grundschule Wasseralfingen","city":"Aalen","lat":48.8612,"lng":10.11191,"official":0,"marker_class":"marker green","marker_label":"G","website":null},...]
  • 请注意,结果限制为500,您必须使用和过滤器并合并结果才能获得所有结果。*:

这样的限制是正确的。梅尔有500个特雷弗没有被使用。请你原谅我的做法,因为你有一个B。

kr98yfug

kr98yfug2#

除了the answer already given之外。
要获取API的GET请求的所有搜索条件,可以使用已经导入的BeautifulSoup解析主页内容:

from bs4 import BeautifulSoup
import requests

search_page_url = "https://schulfinder.kultus-bw.de"
page_contents = requests.request("GET", search_page_url).text

parsed_html = BeautifulSoup(page_contents, features="html.parser")
input_elements = parsed_html.body.find_all('input')
search_params = list(map(lambda x: (x.get('name'), x.get('type'), x.get('value')), input_elements))

search_params包含名称、类型和值的元组,它应该可以给予您深入了解参数及其可能的值。

相关问题