selenium 使用beautifulsoup提取网站中的文本

ltskdhd1  于 2023-04-30  发布在  其他
关注(0)|答案(2)|浏览(191)

我需要提取md-card容器中的上下文,按照所附的图像。我只需要得到文本或任何格式的摘录。我试过使用BS,但它不起作用。请提出做这项工作的方法。谢谢。
enter image description here
import requests from bs4 import BeautifulSoup

page = requests.get('https://www.plugshare.com/location/81189')
 soup = BeautifulSoup(page.content, 'html.parser') 
 #x = soup.find_all('div')
 x = soup.find_all('md-card')
 #print(page.status_code)

 print(x)
v8wbuo2f

v8wbuo2f1#

  • 请改进您的问题,以包括确切的预期结果,因为在您概述的领域有很多信息,不清楚哪些是相关的。*

因此,根据你的问题的初始状态,只是给予你一个想法,你应该筛选你的soup有一个<script>,其中包含一些信息-如果你需要更多或具体检查我的评论:
网站的内容由JavaScript动态加载。由于requests模块只加载初始静态源,这与呈现动态内容并操纵结构的浏览器的行为不同,beautifulsoup无法找到所需的元素。替代解决方案:找到提供信息的API调用,或者使用Selenium或模仿浏览器行为的其他模块。

示例
import requests, json
from bs4 import BeautifulSoup
url = 'https://www.plugshare.com/location/81189'
soup = BeautifulSoup(requests.get(url).text)

json.loads(soup.select_one('[type="application/ld+json"]').text)
输出
{'@context': 'http://schema.org',
 '@type': 'LocalBusiness',
 'aggregateRating': {'@type': 'AggregateRating',
  'bestRating': 10,
  'worstRating': 1,
  'ratingValue': '10.0',
  'reviewCount': '122'},
 'name': 'Balladonia Hotel',
 'description': '22kW CCS2 DC charger connected to 3 phase, crowd funded and owned by UWA. Get key from staff to turn on the charger, photograph kWh consumed before returning key. Please log in your charging time to Plugshare to assist other EV owners travelling the Nullarbor.',
 'image': 'https://photos.plugshare.com/photos/1069516.jpg',
 'url': '/location/81189',
 'publicAccess': True}
7gs2gvoe

7gs2gvoe2#

正如@HedgeHog所说,数据是从外部URL加载的。要获取Json格式的所有数据,您可以尝试:

import json
import requests

api_url = "https://api.plugshare.com/v3/locations/81189"

headers = {
    "Authorization": "Basic d2ViX3YyOkVOanNuUE54NHhXeHVkODU=",
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/112.0",
    "Origin": "https://www.plugshare.com",
}

data = requests.get(api_url, headers=headers).json()
print(json.dumps(data, indent=4))

图纸:

{
    "access": 1,
    "access_restriction": null,
    "access_restriction_description": null,
    "access_restrictions": [],
    "address": "Eyre Highway Balladonia Western Australia 6443",
    "all_promos": [],
    "amenities": [
        {
            "location_id": 81189,
            "type": 1
        },
        {
            "location_id": 81189,
            "type": 2
        },
        {
            "location_id": 81189,
            "type": 4
        }
    ],
    "coming_soon": false,
    "confidence": 2,
    "cost": true,
    "cost_description": "$1 per kWh",
    "created_at": "2016-02-15T03:32:10Z",
    "custom_ports": "",
    "datasources": [],
    "description": "22kW CCS2 DC charger connected to 3 phase, crowd funded and owned by UWA. Get key from staff to turn on the charger, photograph kWh consumed before returning key.\nPlease log in your charging time to Plugshare to assist other EV owners travelling the Nullarbor.",
    "e164_phone_number": "+61890393453",
    "enabled": true,
    "entrance_latitude": null,
    "entrance_longitude": null,
    "formatted_phone_number": "+61 8 9039 3453",
    "has_dynamic_pricing": false,
    "hours": "6am - 8pm",
    "icon": "https://assets.plugshare.com/icons/Y.png",
    "icon_type": "Y",
    "id": 81189,
    "latitude": -32.351938,
    "locale": "AU",
    "locale_v2": "AU",
    "locked": false,
    "longitude": 123.617127,
    "majority_network_id": null,
    "meta_description": "Electric Car (EV) Charging Station at Eyre Highway Balladonia Western Australia 6443. Information provided by PlugShare, the world's most popular map for finding electric car (EV) charging stations.",
    "name": "Balladonia Hotel",
    "nissan_nctc": false,
    "ocpi_ids": [],
    "open247": false,
    "opened_at": null,
    "opening_date": null,
    "opening_times": null,
    "overhead_clearance_meters": null,
    "parking_attributes": [
        "PULL_THROUGH"
    ],
    "parking_level": null,
    "parking_type_name": "Pay",
    "payment_enabled": null,
    "phone": "0890393453",
    "photos": [
        {
            "caption": "",
            "created_at": "2023-01-26T04:51:36Z",
            "id": 1069516,
            "is_visible": true,
            "language": null,
            "order": null,
            "thumbnail": "https://photos.plugshare.com/thumb/1069516.png",
            "thumbnail2x": "https://photos.plugshare.com/thumb2x/1069516.png",
            "url": "https://photos.plugshare.com/photos/1069516.jpg",
            "user_id": 2432476
        },

...and so on.

相关问题