css 查找所有div,从span中抓取

wwtsj6pe  于 2023-06-07  发布在  其他
关注(0)|答案(1)|浏览(120)
<div class="NewsList_newsListContent__4UpiN">
<div>
<div>
<div class="NewsList_newsListItemWrap__XovMP">
<div style="display: flex;">
<div class="NewsList_newsListItem__yRAbe">
<a href="/flash-categories/Currency">
<div class="NewsList_newsListTag__TGHJ_">
<span>Currency</span>
</div></a></div></div>
<div class="NewsList_newsListContent__4UpiN">
<div class="NewsList_infoNewsListSubMobile__SPmAG">
<span>06 Jun 2023, 10:05 am </span>
</div>
<div class="NewsList_newsListText__hstO7">
<a href="/node/669947">
# <span class="NewsList_newsListItemHead__dg7eK"**>Ringgit lower against US dollar in early session on June 6**</span>
</a>
<a href="/node/669947">
<span class="NewsList_newsList__2fXyv">KUALA LUMPUR (June 6): The ringgit opened lower against   the US dollar in the early session on Tuesday (June 6), as investors remain cautious on the global outlook despite a slightly weaker greenback, an analyst said.&nbsp;At 9am, the local note fell to 4.5950/6000 versus the greenback, compared with Friday (June 2)’s closing of&nbsp;4.5745/5785.  </span>
</a>
</div>

例如:我想刮上面的粗体字:林Git兑美元在6月6日早盘下跌
这是我的剧本:

import requests
from bs4 import BeautifulSoup

url = "https://theedgemalaysia.com/categories/malaysia"

# Send a GET request to the URL
response = requests.get(url)

# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Find all <div> elements with class "NewsList_newsListContent__4UpiN"
container_divs = soup.find_all('div', class_='NewsList_newsListContent__4UpiN')

# Iterate over the container divs
for container_div in container_divs:
    # Find all <div> elements with class "NewsList_newsListText__hstO7" within the container
    news_text_divs = container_div.find_all('div', class_='NewsList_newsListText__hstO7')

    # Iterate over the news text divs
    for news_text_div in news_text_divs:
        # Find the <span> element with class "NewsList_newsListItemHead__dg7eK" within the news text div
        headline_span = news_text_div.find('span', class_='NewsList_newsListItemHead__dg7eK')

        # Print the text of the headline
        if headline_span:
            print(headline_span.text)

我已经尝试了上面的脚本,无法找到超过错误,这里的任何人都可以看看,让我知道问题在哪里,请?非常感谢!

nnt7mjpx

nnt7mjpx1#

该页面由JS基于script标记中的一些现有信息形成。请求不能执行Javascript,所以它不会看到那些标题,因为你在支持JS的浏览器中访问页面时看到它们。
以下是获得这些标题的一种方法:

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import json 
headers= {
    'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
}

url = 'https://theedgemalaysia.com/categories/malaysia'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')

data_script = soup.select_one('script[id="__NEXT_DATA__"]')
data = json.loads(data_script.string)
df = pd.json_normalize(data['props']['pageProps']['corporateData'])
print(df)

终端结果:

nid     type    language    category    options     flash   tags    edited  title   created     updated     author  source  audio   audioflag   alias   video_url   img     caption     summary
0   669998  article     english     Corporate,Malaysia  Top Stories     Noon Market             Bursa stays in the red at midday    1686027040000   1686027040000   Bernama     Bernama         0   node/669998         https://assets.theedgemarkets.com/noon-market-...       Bursa Malaysia stayed in the red at midday due...
1   669997  article     english     Corporate,Malaysia                  Skyworld eyes 3Q Main Market listing, inks und...   1686026908000   1686026908000   Lam Jian Wyn    theedgemalaysia.com         0   node/669997         https://assets.theedgemarkets.com/SkyWorld-Dev...       KUALA LUMPUR (June 6): SkyWorld Development Bh...
2   669995  article     english     Malaysia    Top Stories,Politics & Government   Parliament          Investigation into Kedah MB over comments Pena...   1686026473000   1686026473000   Hailey Chung & Chester Tay  theedgemalaysia.com         0   node/669995         https://assets.theedgemarkets.com/Kedah Sanusi...   Kedah Menteri Besar Datuk Seri Muhammad Sanusi...   Police have started an investigation into Keda...
3   669984  article     english     Malaysia,Economy    Top Stories,Politics & Government   Parliament  mynewstv        Anwar defends BNM’s gradual approach to moneta...   1686025226000   1686025226000   Hailey Chung & Chester Tay  theedgemalaysia.com         0   node/669984         https://assets.theedgemarkets.com/Anwar 060620...       Higher borrowing costs and the sharp depreciat...
4   669980  article     english     Malaysia,World,Economy  Top Stories,Politics & Government   ESG             Global carbon markets face upheaval as nations...   1686024746000   1686024746000   Natasha White & Ewa Krukowska   Bloomberg       0   node/669980         https://assets.theedgemarkets.com/398972891-fo...       LONDON/BRUSSELS (June 6): The US$2 billion mar...
5   669961  article     english     Corporate,Malaysia              Isabelle Francis    CGS-CIMB starts coverage of Dayang Enterprise ...   1686022324000   1686022324000   Anis Hazim  theedgemalaysia.com         0   node/669961         https://assets.theedgemarkets.com/Dayang-Enter...       CGS-CIMB has initiated coverage of Dayang Ente...
6   669957  article     english     Malaysia    Politics & Government               Kit Siang expresses gratitude to Agong for 'Ta...   1686021406000   1686021406000   Bernama     Bernama         0   node/669957         https://assets.theedgemarkets.com/Lim-Kit-sian...       Veteran politician Tan Sri Lim Kit Siang expre...
7   669956  article     english     Corporate,Malaysia          mynewstv        1Q results came broadly below expectations, sa...   1686020951000   1686020951000   Isabelle Francis    theedgemalaysia.com         0   node/669956         https://assets.theedgemarkets.com/Bursa-Malays...       KUALA LUMPUR (June 6): Analysts said the first...
8   669954  article     english     Corporate,Management,Malaysia   Top Stories     ESG     mynewstv        24 public-listed companies still have no women...   1686020019000   1686020019000   Tan Zhai Yun    theedgemalaysia.com         0   node/669954         https://assets.theedgemarkets.com/Bursa-4_2023...       KUALA LUMPUR (June 6): As at June 1, 2023, 24 ...
9   669953  article     english     Corporate,Malaysia      Hot Stock   mynewstv    Lam Jian Wyn    Bumi Armada shares fall 20.47% on Kraken FPSO ...   1686019041000   1686019041000   Anis Hazim  theedgemalaysia.com         0   node/669953         https://assets.theedgemarkets.com/Bumi-Armada-...       KUALA LUMPUR (June 6): Shares of Bumi Armada B...
10  669951  article     english     Corporate,Malaysia,World        Global Markets          Asian stocks wobble as traders weigh Fed rate ...   1686018738000   1686018738000   Ankur Banerjee  Reuters         0   node/669951         https://assets.theedgemarkets.com/395135636-As...       SINGAPORE (June 6): Asian stock markets edged ...
11  669948  article     english     Malaysia,Court  Politics & Government           Lam Jian Wyn    High Court dismisses Zuraida’s leave applicati...   1686017713000   1686017713000   Tarani Palani   theedgemalaysia.com         0   node/669948         https://assets.theedgemarkets.com/Zuraida-Kama...   Ampang member of Parliament Datuk Zuraida Kama...   KUALA LUMPUR (June 6): The High Court has dism...
12  669947  article     english     Malaysia    Top Stories     Currency            Ringgit lower against US dollar in early sessi...   1686017126000   1686017126000   Bernama     Bernama         0   node/669947         https://assets.theedgemarkets.com/Ringgit-5_20...       KUALA LUMPUR (June 6): The ringgit opened lowe...
13  669945  article     english     Corporate,Malaysia      Market Open             Bursa Malaysia marginally higher in early sess...   1686016694000   1686016694000   Bernama     Bernama         0   node/669945         https://assets.theedgemarkets.com/opening-mark...       KUALA LUMPUR (June 6): Bursa Malaysia rebounde...

请参阅BeautifulSoup文档here,对于pandas文档,请参阅here

相关问题