我需要帮助为标题链接url找到正确的html标记我的刮板的目的是刮标题，故事，链接

mec1mxoz 于 2021-08-20 发布在 Java

关注(0)|答案(1)|浏览(380)

当我运行scraper时，django主页上的输出是正常的，但是url显示一条错误消息404和其他文章，显示我使用了错误的标记https://www.coindesk.com/news/tag/crypto-lending 正确的链接url是https://www.coindesk.com/news/tag/crypto-lending. 带有链接的正确标记是<a title= href<。我怎么写这个标签

from bs4 import BeautifulSoup
import requests

crypto_headlines = []

def crypto_news():
    """ user agent to facilitates end-user interaction with web content"""

    headers = {
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'
    }

    base_url ='https://www.coindesk.com/news'

    source = requests.get(base_url).text

    soup = BeautifulSoup(source, "html.parser")       

    articles = soup.find_all(class_ = 'text-content')

    #print(len(articles))
    #print(articles) 

    for article in articles:

        try:

            headline = article.h4.text.strip()
            text = article.find(class_="card-text").text.strip()
            link = base_url + article.a['href']
            #img_url = base_url + article.image_src['src']

            crypto_dict = {}

            crypto_dict['Headline']= headline
            crypto_dict['Text'] = text
            crypto_dict['Link']= link

            crypto_headlines.append(crypto_dict)
        except AttributeError as ex:
            print('Error:', ex)

    print(crypto_headlines)

crypto_news()

python Html beautifulsoup web-scraping Hyperlink

来源：https://stackoverflow.com/questions/68328654/i-need-help-finding-the-correct-html-tag-for-headline-links-url-my-web-scraper

1条答案

按热度按时间

vmdwslir1#

你错了 <a> ，你是从第一个刮来的 <a> 但需要的链接在第二位 <a> .
这是密码

link = base_url + article.find_all("a")[1]["href"]

只要换一条线就能解决你的问题！

赞(0）回复(0）举报 2021-08-20

我来回答

我需要帮助为标题链接url找到正确的html标记我的刮板的目的是刮标题，故事，链接

1条答案

相关问题

热门标签

最新问答