我想使用beautifulsoup和select()方法获取最新的帖子文本。
import requests
from bs4 import BeautifulSoup
headers = 'User-Agent':'Mozilla/5.0'
url = "https:// "
req = requests.get(url, headers=headers)
html = req.text
soup = BeautifulSoup(html, 'html.parser')
link = soup.select('#flagList > div.clear.ab-webzine > div > a')
title = soup.select('#flagList > div.clear.ab-webzine > div > div.wz-item-header > a > span')
latest_link = link[0] # link of latest post
latest_title = title[0].text # title of latest post
# to get the text of latest post
t_url = latest_link
t_req = requests.get(t_url, headers=headers)
t_html = c_res.text
t_soup = BeautifulSoup(t_html, 'html.parser')
maintext = t_soup.select ('#flagArticle > div.document_1234567_0.rhymix_content.xe_content')
print(maintext)
它返回[]。
我抄了 #flagArticle > div.document_1234567_0.rhymix_content.xe_content
从帖子上的chrome开发者工具。
所以它有具体的邮政编码“1234567”
但我想要的是“最新帖子”的文本,而不是某个帖子。
所以我把它改成了 #flagArticle
它返回如下。
[<article id="flagArticle">
<!--BeforeDocument(1234567,0)-->
<div class="document_1234567_0 rhymix_content xe_content"><p>TEXTTEXTTEXT 1</p>
<p>TEXTTEXTTEXT 2</p>
<p>TEXTTEXTTEXT 3</p></div><!--AfterDocument(1234567,0)-->
<!--
-- color class --
vb-white
vb-green
vb-blue
vb-skyblue
vb-orange
vb-red
-->
<div class="vote">
<button class="vb-btn vb-orange" onclick="vote_doVote('Up','1234567');return false;" type="button">
<span class="lang">
<i class="fas fa-star fa-spin fa-fw"></i>
recommended </span>
<span class="num" id="vm_v_count">
4 </span>
</button> <button class="vb-btn vb-skyblue" onclick="vote_doVote('Declare','1234567');return false;" type="button">
<span class="lang">
<i class="fa fa-times-circle"></i>
report </span>
<span class="num" id="vm_d_count">
</span>
</button></div> </article>]
但是我想
TEXTTEXTTEXT 1
TEXTTEXTTEXT 2
TEXTTEXTTEXT 3
我应该换什么?
(我无法共享url,因为它是私有站点)
1条答案
按热度按时间vhmi4jdf1#
就拿第一个吧
div
.