pyspark 如何格式化一个Spark数据框到excel文件与颜色的基础上的数据,然后写入到一个Azure存储

8xiog9wr  于 2023-01-20  发布在  Spark
关注(0)|答案(1)|浏览(118)

问题陈述是:-数据是Spark中的结构化表格,您需要查询它并转换为格式,然后写入xlsx文件,颜色编码为必填列橙子、可选列黄色和缺少内容的行红色
有不同的方法,但都不起作用,因为当你试图写作时,风格会变得松散
尝试转换spark Dataframe ,执行条件格式化并使用BlockBlobService create_blob_from文本,尝试写入,但未成功

szqfcxe2

szqfcxe21#

from io import BytesIO
        from azure.storage.blob import BlockBlobService
        
        blobService = BlockBlobService(account_name="storageaccountname", account_key="Storage Key",protocol='https')
        # sample = pd.DataFrame(sample_dict)
        sample = pd_data_df
        
        # Create a Pandas Excel writer using XlsxWriter as the engine.
        
        output = BytesIO()
        writer = pd.ExcelWriter(output, engine='xlsxwriter')
        
        # Convert the dataframe to an XlsxWriter Excel object.
        sample.to_excel(writer, sheet_name='Sheet1')
        
        # Get the xlsxwriter workbook and worksheet objects.
        workbook  = writer.book
        worksheet = writer.sheets['Sheet1']
        
        # Add a format.
        format1 = workbook.add_format({'bg_color': 'red'})
        
        # Get the dimensions of the dataframe.
        (max_row, max_col) = sample.shape
        
        # Apply a conditional format to the required cell range.
        worksheet.conditional_format(1, 1, max_row, max_col,
                                    {'type':     'blanks',
                                    'format':   format1})
        
        # Close the Pandas Excel writer and output the Excel file.
        writer.save()
        
        xlsx_data = output.getvalue()
        
        blobService.create_blob_from_bytes(container_name,frolder_path_with_file_name, xlsx_data)
        ## Need to write xlsx_data to blob storage from here

相关问题