我正在尝试使用 newspaper3k
我的程序抛出503个异常。有人能帮我找出原因并帮我解决吗?确切地说,我不是想抓住这些异常,而是想了解它们发生的原因,并尽可能地防止它们。
from newspaper import Article
dates = list()
titles = list()
urls = ['https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-02',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-mps-hearing-may-21',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-05-06',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-fsr-hearing-may-21',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-03-04',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-2019-20-reserve-bank-annual-review',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-12-02',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-28',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-22',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-19',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-09-14']
for url in urls:
speech = Article(url)
speech.download()
speech.parse()
dates.append(speech.publish_date)
titles.append(speech.title)
这是我的回溯:
---------------------------------------------------------------------------
ArticleException Traceback (most recent call last)
<ipython-input-5-217a6cafe26a> in <module>
20 speech = Article(url)
21 speech.download()
---> 22 speech.parse()
23 dates.append(speech.publish_date)
24 titles.append(speech.title)
/opt/anaconda3/lib/python3.8/site-packages/newspaper/article.py in parse(self)
189
190 def parse(self):
--> 191 self.throw_if_not_downloaded_verbose()
192
193 self.doc = self.config.get_parser().fromstring(self.html)
/opt/anaconda3/lib/python3.8/site-packages/newspaper/article.py in throw_if_not_downloaded_verbose(self)
529 raise ArticleException('You must `download()` an article first!')
530 elif self.download_state == ArticleDownloadState.FAILED_RESPONSE:
--> 531 raise ArticleException('Article `download()` failed with %s on URL %s' %
532 (self.download_exception_msg, self.url))
533
ArticleException: Article `download()` failed with 503 Server Error: Service Temporarily Unavailable
for url: https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29
on URL https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29
1条答案
按热度按时间dzhpxtsq1#
以下是如何对故障进行故障排除
503 Server Error: Service Temporarily Unavailable
python包请求出错。为什么会出现503服务器错误?
让我们看看服务器返回的内容。
如果我们查看返回的文本,我们可以看到该网站要求您的浏览器完成
challenge-form.
. 如果您查看其他数据点(例如。cf-content
)在文本中,您可以看到网站受到CloudFlare.
绕过此保护非常困难。以下是我最近关于绕过此保护的复杂性的一个答案。无法从网页中刮取产品标题