python 我如何通过未触及的√

dced5bon  于 2022-12-25  发布在  Python
关注(0)|答案(2)|浏览(98)

有没有可能让原封不动地通过这个还是我要求太多了

import urllib.request
path = 'html'
links = 'links'
with open(links, 'r', encoding='UTF-8') as links:
    for link in links: #for each link in the file
        print(link)
        with urllib.request.urlopen(link) as linker: #get the html
            print(linker)
            with open(path, 'ab') as f: #append the html to html
                f.write(linker.read())

链接

https://myanimelist.net/anime/27899/Tokyo_Ghoul_√A

输出

File "PYdown.py", line 7, in <module>
    with urllib.request.urlopen(link) as linker:
  File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib64/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.6/urllib/request.py", line 1392, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib64/python3.6/urllib/request.py", line 1349, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib64/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1265, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib64/python3.6/http/client.py", line 1132, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u221a' in position 29: ordinal not in range(128)
p4tfgftt

p4tfgftt1#

你需要在URL中引用Unicode字符。你有一个文件,其中包含你需要打开的URL列表,所以你需要拆分每个URL (使用urllib.parse.urlsplit(),引用 (使用urllib.parse.quote() 主机和路径的每一部分 (拆分路径,你可以使用pathlib.PurePosixPath.parts,然后形成URL回来 (使用urllib.parse.urlunsplit()

from pathlib import PurePosixPath
from urllib.parse import urlsplit, urlunsplit, quote, urlencode, parse_qsl

def normalize_url(url):
    splitted = urlsplit(url)  # split link
    path = PurePosixPath(splitted.path)  # initialize path
    parts = iter(path.parts)  # first element always "/"
    quoted_path = PurePosixPath(next(parts))  # "/"
    for part in parts:
        quoted_path /= quote(part)  # quote each part
    return urlunsplit((
        splitted.scheme,
        splitted.netloc.encode("idna").decode(),  # idna
        str(quoted_path),
        urlencode(parse_qsl(splitted.query)),  # force encode query
        splitted.fragment
    ))

用法:

links = (
    "https://myanimelist.net/anime/27899/Tokyo_Ghoul_√A",
    "https://stackoverflow.com/",
    "https://www.google.com/search?q=√2&client=firefox-b-d",
    "http://pfarmerü.com/"
)

print(*(normalize_url(link) for link in links), sep="\n")

输出:

https://myanimelist.net/anime/27899/Tokyo_Ghoul_%E2%88%9AA
https://stackoverflow.com/
https://www.google.com/search?q=%E2%88%9A2&client=firefox-b-d,
http://xn--pfarmer-t2a.com/
gopyfrb3

gopyfrb32#

为了让python输出,我不得不将转换为%E2%88%9A,而不是让python将读取为它自己
信用证@Olvin Right

相关问题