当我运行scraper时,django主页上的输出是正常的,但是url显示一条错误消息404和其他文章,显示我使用了错误的标记https://www.coindesk.com/news/tag/crypto-lending 正确的链接url是https://www.coindesk.com/news/tag/crypto-lending. 带有链接的正确标记是<a title= href<。我怎么写这个标签
from bs4 import BeautifulSoup
import requests
crypto_headlines = []
def crypto_news():
""" user agent to facilitates end-user interaction with web content"""
headers = {
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'
}
base_url ='https://www.coindesk.com/news'
source = requests.get(base_url).text
soup = BeautifulSoup(source, "html.parser")
articles = soup.find_all(class_ = 'text-content')
#print(len(articles))
#print(articles)
for article in articles:
try:
headline = article.h4.text.strip()
text = article.find(class_="card-text").text.strip()
link = base_url + article.a['href']
#img_url = base_url + article.image_src['src']
crypto_dict = {}
crypto_dict['Headline']= headline
crypto_dict['Text'] = text
crypto_dict['Link']= link
crypto_headlines.append(crypto_dict)
except AttributeError as ex:
print('Error:', ex)
print(crypto_headlines)
crypto_news()
1条答案
按热度按时间vmdwslir1#
你错了
<a>
,你是从第一个刮来的<a>
但需要的链接在第二位<a>
.这是密码
只要换一条线就能解决你的问题!