我在一个名为URL的数据框架中有一列。我正在尝试向这些服务器发送请求并获取
内容的元素。问题发生在我运行我的脚本和始终与第7次请求。如果我使用k+=5
,则在上一次运行时显示此错误的URL将成功运行,但在从5开始的第7个URL处,python再次显示此错误
连接错误:('Connection aborted.',RemoteDisconnected('Remote end closed connection without response'))
我希望错误信息更精确,但我不知道为什么会这样。
如果你想在你的系统上检查的话,可以使用一个URL列表。
https://www.teraz.sk/ekonomika/pred-200-rokmi-sa-narodil-wilhelm-siemen/705340-clanok.html
https://www.marketscreener.com/quote/stock/SIEMENS-AG-56358595/news/Siemens-Industrial-Operations-X-brings-cutting-edge-IT-and-AI-into-industrial-automation-43481650/
https://www.prnewswire.com/news-releases/general-motors-names-siemens-a-2022-supplier-of-the-year-301795897.html
https://www.dha.com.tr/ekonomi/siemens-turkiye-siemens-xceleratoru-hayata-gecirdi-2227176
https://www.investegate.co.uk/siemens-healthineers-ag/eqs/release-of-a-capital-market-information/20230403140007ECKGU/
https://www.prnewswire.com/news-releases/siemens-and-microsoft-drive-industrial-productivity-with-generative-artificial-intelligence-301795367.html
https://zdnet.co.kr/view/?no=20230424173326
https://finance.sina.com.cn/tech/roll/2023-03-17/doc-imymcrfc4327961.shtml
https://www.turkiyegazetesi.com.tr/ekonomi/healthineers-istanbulu-tercih-etti-958021
https://zdnet.co.kr/view/?no=20230327164412
https://www.marketscreener.com/quote/stock/SIEMENS-ENERGY-AG-113013151/news/CMS-Siemens-Energy-AG-Release-of-a-capital-market-information-43463180/
https://www.prnewswire.com/news-releases/daimler-truck-collaborates-with-siemens-to-build-an-integrated-digital-engineering-platform-301780181.html
https://finance.yahoo.com/news/daimler-truck-collaborates-siemens-build-080000265.html
https://technews.tw/2023/05/12/siemens-eda-tsmc/
这是我的代码:
blocklist = [
'style',
'script',
'meta',
'head'
# other elements,
]
for k,i in enumerate(df['url']):
#k+=5
website_text=list()
print(df.at[k,'url'])
response=requests.get(i)
soup = BeautifulSoup(response.content, 'html.parser')
if soup.findAll('p'):
for data in soup.find_all("p"):
#print(data.get_text(),'\n','=================================================================================================','\n')
website_text.append(data.get_text())
df.at[k,'text']=website_text
df.head()
这是完整错误消息:
---------------------------------------------------------------------------
RemoteDisconnected Traceback (most recent call last)
File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:790, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
789 # Make the request on the HTTPConnection object
--> 790 response = self._make_request(
791 conn,
792 method,
793 url,
794 timeout=timeout_obj,
795 body=body,
796 headers=headers,
797 chunked=chunked,
798 retries=retries,
799 response_conn=response_conn,
800 preload_content=preload_content,
801 decode_content=decode_content,
802 **response_kw,
803 )
805 # Everything went great!
File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
535 try:
--> 536 response = conn.getresponse()
537 except (BaseSSLError, OSError) as e:
File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connection.py:454, in HTTPConnection.getresponse(self)
453 # Get the response from http.client.HTTPConnection
--> 454 httplib_response = super().getresponse()
456 try:
File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:1375, in HTTPConnection.getresponse(self)
1374 try:
-> 1375 response.begin()
1376 except ConnectionError:
File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:318, in HTTPResponse.begin(self)
317 while True:
--> 318 version, status, reason = self._read_status()
319 if status != CONTINUE:
File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:287, in HTTPResponse._read_status(self)
284 if not line:
285 # Presumably, the server closed the connection before
286 # sending a valid response.
--> 287 raise RemoteDisconnected("Remote end closed connection without"
...
503 except MaxRetryError as e:
504 if isinstance(e.reason, ConnectTimeoutError):
505 # TODO: Remove this in 3.0.0: see #2811
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
1条答案
按热度按时间edqdpe6u1#
我在另一个post but with different Error Message上找到了答案。
问题是网站会过滤掉没有合适的User-Agent的请求,所以只需要从MDN中随机选择一个: