<div class="NewsList_newsListContent__4UpiN">
<div>
<div>
<div class="NewsList_newsListItemWrap__XovMP">
<div style="display: flex;">
<div class="NewsList_newsListItem__yRAbe">
<a href="/flash-categories/Currency">
<div class="NewsList_newsListTag__TGHJ_">
<span>Currency</span>
</div></a></div></div>
<div class="NewsList_newsListContent__4UpiN">
<div class="NewsList_infoNewsListSubMobile__SPmAG">
<span>06 Jun 2023, 10:05 am </span>
</div>
<div class="NewsList_newsListText__hstO7">
<a href="/node/669947">
# <span class="NewsList_newsListItemHead__dg7eK"**>Ringgit lower against US dollar in early session on June 6**</span>
</a>
<a href="/node/669947">
<span class="NewsList_newsList__2fXyv">KUALA LUMPUR (June 6): The ringgit opened lower against the US dollar in the early session on Tuesday (June 6), as investors remain cautious on the global outlook despite a slightly weaker greenback, an analyst said. At 9am, the local note fell to 4.5950/6000 versus the greenback, compared with Friday (June 2)’s closing of 4.5745/5785. </span>
</a>
</div>
例如:我想刮上面的粗体字:林Git兑美元在6月6日早盘下跌
这是我的剧本:
import requests
from bs4 import BeautifulSoup
url = "https://theedgemalaysia.com/categories/malaysia"
# Send a GET request to the URL
response = requests.get(url)
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
# Find all <div> elements with class "NewsList_newsListContent__4UpiN"
container_divs = soup.find_all('div', class_='NewsList_newsListContent__4UpiN')
# Iterate over the container divs
for container_div in container_divs:
# Find all <div> elements with class "NewsList_newsListText__hstO7" within the container
news_text_divs = container_div.find_all('div', class_='NewsList_newsListText__hstO7')
# Iterate over the news text divs
for news_text_div in news_text_divs:
# Find the <span> element with class "NewsList_newsListItemHead__dg7eK" within the news text div
headline_span = news_text_div.find('span', class_='NewsList_newsListItemHead__dg7eK')
# Print the text of the headline
if headline_span:
print(headline_span.text)
我已经尝试了上面的脚本,无法找到超过错误,这里的任何人都可以看看,让我知道问题在哪里,请?非常感谢!
1条答案
按热度按时间nnt7mjpx1#
该页面由JS基于
script
标记中的一些现有信息形成。请求不能执行Javascript,所以它不会看到那些标题,因为你在支持JS的浏览器中访问页面时看到它们。以下是获得这些标题的一种方法:
终端结果:
请参阅BeautifulSoup文档here,对于pandas文档,请参阅here。