如何在Python中使用Pandas从s3存储桶读取csv文件

f0ofjuux 于 2023-01-15 发布在 Python

关注(0)|答案(5)|浏览(156)

我尝试使用以下代码将AWS S3存储桶中的CSV文件作为panda Dataframe 读入内存：

import pandas as pd
import boto

data = pd.read_csv('s3:/example_bucket.s3-website-ap-southeast-2.amazonaws.com/data_1.csv')

为了给予完全访问权限，我在S3存储桶上设置了存储桶策略，如下所示：

{
"Version": "2012-10-17",
"Id": "statement1",
"Statement": [
    {
        "Sid": "statement1",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": "arn:aws:s3:::example_bucket"
    }
]

}
不幸的是，我仍然在python中得到以下错误：

boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed

想知道是否有人可以帮助解释如何在AWS S3中正确设置权限或正确配置Pandas导入文件。谢谢！

pandas

来源：https://stackoverflow.com/questions/30818341/how-to-read-a-csv-file-from-an-s3-bucket-using-pandas-in-python

5条答案

按热度按时间

rwqw0loc1#

使用Pandas0.20.3

import boto3
import pandas as pd
import sys

if sys.version_info[0] < 3: 
    from StringIO import StringIO # Python 2.x
else:
    from io import StringIO # Python 3.x

client = boto3.client('s3')

bucket_name = 'my_bucket'

object_key = 'my_file.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')

df = pd.read_csv(StringIO(csv_string))

赞(0）回复(0）举报 2023-01-15

a8jjtwal2#

基于建议使用smart_open从S3阅读的this answer，下面是我在Pandas中使用它的方式：

import os
import pandas as pd
from smart_open import smart_open

aws_key = os.environ['AWS_ACCESS_KEY']
aws_secret = os.environ['AWS_SECRET_ACCESS_KEY']

bucket_name = 'my_bucket'
object_key = 'my_file.csv'

path = 's3://{}:{}@{}/{}'.format(aws_key, aws_secret, bucket_name, object_key)

df = pd.read_csv(smart_open(path))

赞(0）回复(0）举报 2023-01-15

y3bcpkx13#

你不需要panda..你可以使用python的默认csv库

def read_file(bucket_name,region, remote_file_name, aws_access_key_id, aws_secret_access_key):
    # reads a csv from AWS

    # first you stablish connection with your passwords and region id

    conn = boto.s3.connect_to_region(
        region,
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key)

    # next you obtain the key of the csv you want to read
    # you will need the bucket name and the csv file name

    bucket = conn.get_bucket(bucket_name, validate=False)
    key = Key(bucket)
    key.key = remote_file_name
    data = key.get_contents_as_string()
    key.close()

    # you store it into a string, therefore you will need to split it
    # usually the split characters are '\r\n' if not just read the file normally 
    # and find out what they are 

    reader = csv.reader(data.split('\r\n'))
    data = []
    header = next(reader)
    for row in reader:
        data.append(row)

    return data

希望它解决了你的问题，祝你好运！：）

赞(0）回复(0）举报 2023-01-15

okxuctiv4#

我最终意识到，您还需要设置bucket中每个单独对象的权限，以便使用以下代码提取它：

from boto.s3.key import Key
k = Key(bucket)
k.key = 'data_1.csv'
k.set_canned_acl('public-read')

我还必须修改www.example.com _csv命令中的存储桶地址pd.read，如下所示：

data = pd.read_csv('https://s3-ap-southeast-2.amazonaws.com/example_bucket/data_1.csv')

赞(0）回复(0）举报 2023-01-15

hrysbysz5#

您可以使用AWS SDK for Pandas，这是一个扩展Pandas的库，可以与AWS数据存储（如S3）顺畅地工作。

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

赞(0）回复(0）举报 2023-01-15

我来回答

如何在Python中使用Pandas从s3存储桶读取csv文件

5条答案

相关问题

热门标签

最新问答