python-3.x 从Azure blob存储读取csv并存储在DataFrame中

ztyzrc3y  于 2023-03-04  发布在  Python
关注(0)|答案(6)|浏览(168)

我正在尝试使用python从blob存储读取多个CSV文件。
我使用的代码是:

blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs(folder_root)
for blob in blobs_list:
    blob_client = blob_service_client.get_blob_client(container=container, blob="blob.name")
    stream = blob_client.download_blob().content_as_text()

我不知道什么是正确的方式来存储在Pandas Dataframe 中读取的CSV文件。
我尝试用途:

df = df.append(pd.read_csv(StringIO(stream)))

但这显示了一个错误。
你知道我该怎么做吗?

j91ykkif

j91ykkif1#

您可以从blob存储中下载文件,然后将数据从下载的文件读入PandasDataFrame。

from azure.storage.blob import BlockBlobService
import pandas as pd
import tables

STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
t1=time.time()
blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)
t2=time.time()
print(("It takes %s seconds to download "+blobname) % (t2 - t1))

# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)

有关详细信息,请参见here
如果你想直接转换,代码会很有用,你需要从blob对象中获取内容,在get_blob_to_text中不需要本地文件名。

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))
yftpprvb

yftpprvb2#

import pandas as pd
data = pd.read_csv('blob_sas_url')

通过右键单击要导入的Azure门户的blob文件并选择生成SAS,可以找到Blob SAS URL。然后,单击生成SAS令牌和URL按钮并将SAS URL复制到上述代码中,以替换blob_sas_url。

owfi6suc

owfi6suc3#

BlockBlobService作为azure-storage的一部分已弃用。请改用以下服务:

!pip install azure-storage-blob
from azure.storage.blob import BlobServiceClient
import pandas as pd

STORAGEACCOUNTURL= <storage_account_url>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
blob_service_client_instance=BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
blob_client_instance = blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None)
with open(LOCALFILENAME, "wb") as my_blob:
    blob_data = blob_client_instance.download_blob()
    blob_data.readinto(my_blob)

#import blob to dataframe
df = pd.read_csv(LOCALFILENAME)

LOCALFILENAME与BLOBNAME相同

agyaoht7

agyaoht74#

现在你可以直接从BlobStorage读取数据到Pandas DataFrame中:

mydata = pd.read_csv(
        f"abfs://{blob_path}",
        storage_options={
            "connection_string": os.environ["STORAGE_CONNECTION"]
    })

其中blob_path是文件的路径,指定为{container-name}/{blob-preifx.csv}

lyfkaqu1

lyfkaqu15#

BlockBlobService确实被弃用了。但是,@Deepak的答案对我不起作用。下面的答案有效:

import pandas as pd
from io import BytesIO
from azure.storage.blob import BlobServiceClient

CONNECTION_STRING= <connection_string>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = container_client.get_blob_client(BLOBNAME)

with BytesIO() as input_blob:
    blob_client.download_blob().download_to_stream(input_blob)
    input_blob.seek(0)
    df = pd.read_csv(input_blob)
ldxq2e6h

ldxq2e6h6#

您可以使用blob_client将文件作为文本读取,并将该文本用作panda read_csv()方法的输入。

import pandas as pd
from io import StringIO
from azure.identity import InteractiveBrowserCredential
from azure.storage.blob import BlobServiceClient, ContainerClient

# name of the file 
file_name = 'sample_file.csv' 
# Note:- include folders if you have a folder structure in the blob 
# container ex: -> main/child/sample.csv

# storage account URL
STORAGE_ACCOUNT_URL = 'https://sampleblob.blob.core.windows.net'
# name of the container that holds your CSV file
BLOB_STORAGE_CONTAINER_NAME = "sample-storage-container"
# Here I am using the interactive credential, you may use any other credential
CREDENTIAL = InteractiveBrowserCredential()

# Create the BlobServiceClient object
blob_service_client = BlobServiceClient(STORAGE_ACCOUNT_URL, credential=CREDENTIAL)
container_client = blob_service_client.get_container_client(container=BLOB_STORAGE_CONTAINER_NAME)
blob_client = container_client.get_blob_client(file_name)
if blob_client.exists() # check if blob exists
    download_stream = blob_client.download_blob() # read file
    df = pd.read_csv(StringIO(download_stream.content_as_text())) # use text as input to pandas
    print(f"Shape of File {file_name} is {df.shape}")

相关问题