python-3.x 连接错误:('Connection aborted.',RemoteDisconnected('Remote end closed connection without response'))

nuypyhwy  于 2023-05-30  发布在  Python
关注(0)|答案(1)|浏览(470)

我在一个名为URL的数据框架中有一列。我正在尝试向这些服务器发送请求并获取
内容的元素。问题发生在我运行我的脚本和始终与第7次请求。如果我使用k+=5,则在上一次运行时显示此错误的URL将成功运行,但在从5开始的第7个URL处,python再次显示此错误
连接错误:('Connection aborted.',RemoteDisconnected('Remote end closed connection without response'))
我希望错误信息更精确,但我不知道为什么会这样。
如果你想在你的系统上检查的话,可以使用一个URL列表。

https://www.teraz.sk/ekonomika/pred-200-rokmi-sa-narodil-wilhelm-siemen/705340-clanok.html
https://www.marketscreener.com/quote/stock/SIEMENS-AG-56358595/news/Siemens-Industrial-Operations-X-brings-cutting-edge-IT-and-AI-into-industrial-automation-43481650/
https://www.prnewswire.com/news-releases/general-motors-names-siemens-a-2022-supplier-of-the-year-301795897.html
https://www.dha.com.tr/ekonomi/siemens-turkiye-siemens-xceleratoru-hayata-gecirdi-2227176
https://www.investegate.co.uk/siemens-healthineers-ag/eqs/release-of-a-capital-market-information/20230403140007ECKGU/
https://www.prnewswire.com/news-releases/siemens-and-microsoft-drive-industrial-productivity-with-generative-artificial-intelligence-301795367.html
https://zdnet.co.kr/view/?no=20230424173326
https://finance.sina.com.cn/tech/roll/2023-03-17/doc-imymcrfc4327961.shtml
https://www.turkiyegazetesi.com.tr/ekonomi/healthineers-istanbulu-tercih-etti-958021
https://zdnet.co.kr/view/?no=20230327164412
https://www.marketscreener.com/quote/stock/SIEMENS-ENERGY-AG-113013151/news/CMS-Siemens-Energy-AG-Release-of-a-capital-market-information-43463180/
https://www.prnewswire.com/news-releases/daimler-truck-collaborates-with-siemens-to-build-an-integrated-digital-engineering-platform-301780181.html
https://finance.yahoo.com/news/daimler-truck-collaborates-siemens-build-080000265.html
https://technews.tw/2023/05/12/siemens-eda-tsmc/

这是我的代码:

blocklist = [
  'style',
  'script',
  'meta',
  'head'
  # other elements,
]

for k,i in enumerate(df['url']):   
#k+=5
    website_text=list()
    print(df.at[k,'url'])   
    response=requests.get(i)
    soup = BeautifulSoup(response.content, 'html.parser')
    if soup.findAll('p'):                          
        for data in soup.find_all("p"): 
            #print(data.get_text(),'\n','=================================================================================================','\n')                          
            website_text.append(data.get_text())
        df.at[k,'text']=website_text
                
df.head()

这是完整错误消息:

---------------------------------------------------------------------------
RemoteDisconnected                        Traceback (most recent call last)
File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:790, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    789 # Make the request on the HTTPConnection object
--> 790 response = self._make_request(
    791     conn,
    792     method,
    793     url,
    794     timeout=timeout_obj,
    795     body=body,
    796     headers=headers,
    797     chunked=chunked,
    798     retries=retries,
    799     response_conn=response_conn,
    800     preload_content=preload_content,
    801     decode_content=decode_content,
    802     **response_kw,
    803 )
    805 # Everything went great!

File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    535 try:
--> 536     response = conn.getresponse()
    537 except (BaseSSLError, OSError) as e:

File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connection.py:454, in HTTPConnection.getresponse(self)
    453 # Get the response from http.client.HTTPConnection
--> 454 httplib_response = super().getresponse()
    456 try:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:1375, in HTTPConnection.getresponse(self)
   1374 try:
-> 1375     response.begin()
   1376 except ConnectionError:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:318, in HTTPResponse.begin(self)
    317 while True:
--> 318     version, status, reason = self._read_status()
    319     if status != CONTINUE:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:287, in HTTPResponse._read_status(self)
    284 if not line:
    285     # Presumably, the server closed the connection before
    286     # sending a valid response.
--> 287     raise RemoteDisconnected("Remote end closed connection without"
...
    503 except MaxRetryError as e:
    504     if isinstance(e.reason, ConnectTimeoutError):
    505         # TODO: Remove this in 3.0.0: see #2811

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
edqdpe6u

edqdpe6u1#

我在另一个post but with different Error Message上找到了答案。
问题是网站会过滤掉没有合适的User-Agent的请求,所以只需要从MDN中随机选择一个:

requests.get("https://apis.digital.gob.cl/fl/feriados/2020", headers={
"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
})

相关问题