Python中的臃肿日志输出Azure函数日志流监视函数从Azure Blob访问数据时的情况

bihw5rsg  于 2023-06-06  发布在  Python
关注(0)|答案(1)|浏览(127)

我的Http触发的Azure函数有一个工作流,由3个步骤组成:
1.它接收带有一些参数的API调用
1.它使用以下函数从Azure Blob读取数据:

def read_dataframe_from_blob(account_name, account_key, container_name, blob_name):
    # Create a connection string to the Azure Blob storage account
    connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"

    # Create a BlobServiceClient object using the connection string
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)

    # Get a reference to the Parquet blob
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)

    # Download the blob data as a stream
    blob_data = blob_client.download_blob()

    # Read the Parquet data from the stream into a pandas DataFrame
    df = pd.read_parquet(io.BytesIO(blob_data.readall()))

    return df

1.它预处理来自1的数据。并返回一些输出。
我之前创建了一个非常类似的工作流,函数日志流非常干净,它只包括日志中定义的元素。但是,当我从blob读取数据时,Azure函数日志流(当然是本地)中的日志以以下开头:

2023-06-05T07:35:42Z   [Information]   Request URL: 'https://myaccount.blob.core.windows.net/mycontainer/my.parquet'
Request method: 'GET'
Request headers:
    'x-ms-range': 'REDACTED'
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.16.0 Python/3.10.11 (Linux-5.10.164.1-1.cm1-x86_64-with-glibc2.31)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'Authorization': 'REDACTED'
No body was attached to the request
2023-06-05T07:35:42Z   [Information]   Response status: 206
Response headers:
    'Content-Length': '33554432'
    'Content-Type': 'application/octet-stream'
    'Content-Range': 'REDACTED'
    'Last-Modified': 'Thu, 01 Jun 2023 08:00:30 GMT'
    'Accept-Ranges': 'REDACTED'
    'ETag': '"0x8DB627644CFEA3E"'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': '08843836-f01e-0019-6780-974298000000'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'x-ms-version': 'REDACTED'
    'x-ms-creation-time': 'REDACTED'
    'x-ms-blob-content-md5': 'REDACTED'
    'x-ms-lease-status': 'REDACTED'
    'x-ms-lease-state': 'REDACTED'
    'x-ms-blob-type': 'REDACTED'
    'Content-Disposition': 'REDACTED'
    'x-ms-server-encrypted': 'REDACTED'
    'Date': 'Mon, 05 Jun 2023 07:35:42 GMT'

......重复多次。然后我从我的日志中获取信息。
这种行为的原因是什么?有没有什么平滑的方法来优化代码或避免这些臃肿的日志?
编辑:我发现了一个类似的讨论here,但我不确定如何将其复制到Python应用程序中。
Edit 2:这不是一个解决方案,但我发现了一个github bug报告here
尽管如此-会感谢任何变通办法。

d4so4syb

d4so4syb1#

import logging

# Set the desired log level (e.g., INFO, DEBUG, ERROR)
logging.basicConfig(level=logging.INFO)

def main(req):
    # Your code to access data from Azure Blob

    # Example logging statements
    logging.info("Accessing data from Azure Blob")
    logging.debug("Debug message")
    logging.error("Error message")

    # Rest of your function code

    return "Function executed successfully"

在这段代码中,logging.basicConfig()设置日志记录的基本配置,包括所需的日志级别。您可以调整日志级别以控制日志的详细程度(例如,logging.INFO、logging.DEBUG、logging.ERROR)。

相关问题