django 如何使用Python API在Google云存储上上传文件夹

yacmzcpb  于 2022-11-26  发布在  Go
关注(0)|答案(9)|浏览(166)

我已经成功上传了Google Cloud Storage上的单个文本文件。但是当我尝试上传whole folder时,它给出了denied error.的权限

filename = "d:/foldername"   #here test1 is the folder.

Error:
Traceback (most recent call last):
  File "test1.py", line 142, in <module>
    upload()
  File "test1.py", line 106, in upload
    media = MediaFileUpload(filename, chunksize=CHUNKSIZE, resumable=True)
  File "D:\jatin\Project\GAE_django\GCS_test\oauth2client\util.py", line 132, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "D:\jatin\Project\GAE_django\GCS_test\apiclient\http.py", line 422, in __init__
    fd = open(self._filename, 'rb')
IOError: [Errno 13] Permission denied: 'd:/foldername'
vhmi4jdf

vhmi4jdf1#

这对我很有效。将本地目录中的所有内容复制到google云存储中的特定bucket-name/full-path(递归):

import glob
from google.cloud import storage

def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
    assert os.path.isdir(local_path)
    for local_file in glob.glob(local_path + '/**'):
        if not os.path.isfile(local_file):
           upload_local_directory_to_gcs(local_file, bucket, gcs_path + "/" + os.path.basename(local_file))
        else:
           remote_path = os.path.join(gcs_path, local_file[1 + len(local_path):])
           blob = bucket.blob(remote_path)
           blob.upload_from_filename(local_file)

upload_local_directory_to_gcs(local_path, bucket, BUCKET_FOLDER_DIR)
wwodge7n

wwodge7n2#

一个没有递归函数的版本,它使用“顶级文件”(与顶级答案不同):

import glob
import os 
from google.cloud import storage

GCS_CLIENT = storage.Client()
def upload_from_directory(directory_path: str, dest_bucket_name: str, dest_blob_name: str):
    rel_paths = glob.glob(directory_path + '/**', recursive=True)
    bucket = GCS_CLIENT.get_bucket(dest_bucket_name)
    for local_file in rel_paths:
        remote_path = f'{dest_blob_name}/{"/".join(local_file.split(os.sep)[1:])}'
        if os.path.isfile(local_file):
            blob = bucket.blob(remote_path)
            blob.upload_from_filename(local_file)
new9mtju

new9mtju3#

文件夹是一个包含文件和目录引用的编目结构。库不接受文件夹作为参数。
据我所知,您的用例是上传到GCS并保留本地文件夹结构。要实现这一点,您可以使用os python模块并创建一个递归函数(例如process_folder),该函数将路径作为参数。此逻辑可用于函数:
1.使用os.listdir()方法获取源路径中的对象列表(将同时返回文件和文件夹)。
1.通过os.path.isdir()方法迭代步骤1中的列表,将文件与文件夹分开。
1.迭代文件并使用调整后的路径(例如,路径+“/”+文件名)上传文件。
1.循环访问文件夹,进行递归调用(例如process_folder(path+folder_name))。
有必要使用两条路径:
1.与操作系统模块一起使用的真实的系统路径(例如“/Users/User/.../upload_folder/folder_name”)。

  1. GCS文件上载的虚拟路径(例如“上载”+"/”+文件夹名称+“/”+文件名)。
    不要忘记实现[1]中提到的指数回退来处理500个错误。您可以使用[2]中的Drive SDK示例作为参考。
    [1]- -https://developers.google.com/storage/docs/json_api/v1/how-tos/upload#exp-backoff
    [2]- https://developers.google.com/drive/web/handle-errors
gstyhher

gstyhher4#

我假设纯粹的filename = "D:\foldername"是不是足够的信息,关于源代码。我也不确定这甚至是可能的。通过网络界面,你也可以只上传文件或创建文件夹,然后在那里上传文件。
您可以保存文件夹名称,然后创建它(我从未使用过google-app-engine,但我想这应该是可能的),然后将内容上传到新文件夹

xriantvc

xriantvc5#

参考-https://hackersandslackers.com/manage-files-in-google-cloud-storage-with-python/

from os import listdir
from os.path import isfile, join

...

def upload_files(bucketName):
    """Upload files to GCP bucket."""
    files = [f for f in listdir(localFolder) if isfile(join(localFolder, f))]
    for file in files:
        localFile = localFolder + file
        blob = bucket.blob(bucketFolder + file)
        blob.upload_from_filename(localFile)
    return f'Uploaded {files} to "{bucketName}" bucket.'
bbmckpt7

bbmckpt76#

该解决方案也可用于Windows系统。只需提供文件夹名称即可上载目标存储桶名称。此外,它还可以处理文件夹中的任何级别的子目录。

import os
from google.cloud import storage
storage_client = storage.Client()
def upload_files(bucketName, folderName):
"""Upload files to GCP bucket."""
bucket = storage_client.get_bucket(bucketName)
for path, subdirs, files in os.walk(folderName):
    for name in files:
        path_local = os.path.join(path, name)
        blob_path = path_local.replace('\\','/')
        blob = bucket.blob(blob_path)
        blob.upload_from_filename(path_local)
zqdjd7g9

zqdjd7g97#

下面是我的递归实现。我们需要创建一个名为gdrive_utils.py的文件并编写以下代码。

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from apiclient.http import MediaFileUpload, MediaIoBaseDownload
import pickle
import glob
import os

# The following scopes are required for access to google drive.
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly',
          'https://www.googleapis.com/auth/drive.metadata',
          'https://www.googleapis.com/auth/drive',
          'https://www.googleapis.com/auth/drive.file',
          'https://www.googleapis.com/auth/drive.appdata']

def get_gdrive_service():
    """
    Tries to authenticate using a token. If token expires or not present creates one.
    :return: Returns authenticated service object
    :rtype: object
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'keys/client-secret.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    # return Google Drive API service
    return build('drive', 'v3', credentials=creds)

def createRemoteFolder(drive_service, folderName, parent_id):
    # Create a folder on Drive, returns the newely created folders ID
    body = {
        'name': folderName,
        'mimeType': "application/vnd.google-apps.folder",
        'parents': [parent_id]
    }

    root_folder = drive_service.files().create(body = body, supportsAllDrives=True, fields='id').execute()
    return root_folder['id']

def upload_file(drive_service, file_location, parent_id):
    # Create a folder on Drive, returns the newely created folders ID
    body = {
        'name': os.path.split(file_location)[1],
        'parents': [parent_id]
    }

    media = MediaFileUpload(file_location,
                            resumable=True)

    file_details = drive_service.files().create(body = body,
                                                media_body=media,
                                                supportsAllDrives=True,
                                                fields='id').execute()
    return file_details['id']

def upload_file_recursively(g_drive_service, root, folder_id):

    files_list = glob.glob(root)
    if files_list:
        for file_contents in files_list:
            if os.path.isdir(file_contents):
                # create new _folder
                new_folder_id = createRemoteFolder(g_drive_service, os.path.split(file_contents)[1],
                                                   folder_id)
                upload_file_recursively(g_drive_service, os.path.join(file_contents, '*'), new_folder_id)
            else:
                # upload to given folder id
                upload_file(g_drive_service, file_contents, folder_id)

然后使用以下命令

import os

from gdrive_utils import createRemoteFolder, upload_file_recursively, get_gdrive_service

g_drive_service = get_gdrive_service()
FOLDER_ID_FOR_UPLOAD = "<replace with folder id where you want upload>"
main_folder_id = createRemoteFolder(g_drive_service, '<name_of_main_folder>', FOLDER_ID_FOR_UPLOAD)

最后用这个

upload_file_recursively(g_drive_service, os.path.join("<your_path_>", '*'), main_folder_id)
xqk2d5yq

xqk2d5yq8#

我刚刚遇到了gcsfs库,它似乎也是关于更好的接口的
您可以将整个目录复制到gcs位置,如下所示:

def upload_to_gcs(src_dir: str, gcs_dst: str):
    fs = gcsfs.GCSFileSystem()
    fs.put(src_dir, gcs_dst, recursive=True)
gcxthw6b

gcxthw6b9#

另一个选择是使用gsutils,这是一个用于与Google Cloud交互的命令行工具:

gsutil cp -r ./my/local/directory gs://my_gcp_bucket/foo/bar

-r标志告诉gsutils递归复制。将gsutils链接到文档。
在Python中调用gsutils可以像这样完成:

import subprocess

subprocess.check_call('gsutil cp -r ./my/local/directory gs://my_gcp_bucket/foo/bar')

相关问题