curl 使用wget下载文本预签名的s3 URL返回二进制

cuxqih21  于 2022-11-13  发布在  其他
关注(0)|答案(1)|浏览(307)

我尝试通过编程下载一个预先签名的S3 URL。我知道我下载的文件是ASCII文本文件。当通过复制粘贴到Chrome中下载URL时,文件确实如我所期望的那样(见下文)。但是,使用wget下载的文件是二进制的。
查看以前的帖子,很遗憾,我没有找到什么有用的东西。帖子建议在网址周围加上引号,但我的网址不包含特殊字符。我查看了一些帖子:Amazon AWS S3 signed URL via Wgethttps://superuser.com/questions/1311516/curl-can-not-download-file-but-browser-can .(实际上我用双引号和单引号进行了双重检查,但在我的例子中都不起作用)。

➜  wget --no-check-certificate --no-proxy  "https://s3.eu-central-1.amazonaws.com/.../text_file.txt"
--2022-07-28 10:49:57--  https://s3.eu-central-1.amazonaws.com/.../text_file.txt
Resolving s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)... 52.219.75.159
Connecting to s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)|52.219.75.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21110 (21K) [binary/octet-stream]
Saving to: ‘text_file.txt’

text_file.txt                                     100%[===========================================================================================================>]  20.62K  --.-KB/s    in 0.004s  

2022-07-28 10:49:57 (5.61 MB/s) - ‘text_file.txt’ saved [21110/21110]

➜  file text_file.txt                                                                                                                                           
text_file.txt: data
➜  cat text_file.txt | head -n 1
[78!???ÊBz?j????X?????x>??_uߩi??a?Qqax?W?ϴ??_c????H???u?c??}???U??5?M?|A?-9?H?Y??\?՟??B?l
2ɯL????:?JZF㽬???,2?gn????Y~vU?l4?O`?!???r                                               ?h?1?]??f???
                                          ?MIUM??_??q?u?dC???v?MbcI>?R??oV???&?
# Following lines are for a file downloaded by copy-paste of the URL to a Chrome window
➜  file text_file\ \(1\).txt 
text_file (1).txt: ASCII text
➜  cat text_file\ \(1\).txt| head -n 1 
# Header of file
jckbn6z7

jckbn6z71#

您所拥有的内容很可能是在S3中压缩的。当文件使用GZip、Brotli、LZW或Zlib等常见压缩格式进行压缩并标记了相应的内容编码时,大多数浏览器都会即时解压缩该文件,以便显示或下载。
例如,如果我们上传一个简单的HTML文件,但将其压缩:

$ cat example_file.html | brotli | \ 
    aws s3 cp - s3://example-bucket/example_html_br.html \
    --acl=public-read --content-encoding br

然后我们可以在浏览器中查看内容,浏览器引擎正在解压缩文件:

但尝试从WGet下载文件时会显示压缩内容:

$ wget -qO- https://example-bucket.s3.amazonaws.com/example_html_br.html | hexdump -C
00000000  1f 6e 00 00 1d 07 ee be  1d 1b 46 77 12 aa 15 78  |.n........Fw...x|
00000010  a8 dc d4 d4 5b 83 cc a0  a5 81 96 1c b0 b7 d5 6d  |....[..........m|
00000020  29 46 f6 fa 6e 63 eb 29  ea aa 82 c8 25 a8 42 91  |)F..nc.)....%.B.|
00000030  ce 1d 07 f6 06 e1 52 0f  f4 4a a9 d6 87 17 76 ff  |......R..J....v.|
00000040  e1 da 01                                          |...|

您可以通过查看HTTP标头来验证这一点:

$ wget -S https://example-bucket.s3.amazonaws.com/example_html_br.html
--2022-08-01 14:10:40--  https://example-bucket.s3.amazonaws.com/example_html_br.html
Resolving example-bucket.s3.amazonaws.com (example-bucket.s3.amazonaws.com)... 52.218.178.75
  [...]
  HTTP/1.1 200 OK
  Content-Encoding: br

这里显示了浏览器触发的内容编码。要么你需要确保将内容放在S3中的组件不会压缩它,要么如果你想下载内容,那么你需要像浏览器一样解压缩它:

wget -qO- https://example-bucket.s3.amazonaws.com/example_html_br.html | brotli -df
<html>
<head>
<title>Example</title>
[...]

如果您使用预先签名的URL,同样的前提也适用。

相关问题