pandas python如何在read_block函数中插入 Dataframe 的结果

nxagd54h 于 2023-11-15 发布在 Python

关注(0)|答案(1)|浏览(100)

bounty将在5天内过期。回答此问题可获得+50声望奖励。Java正在寻找来自信誉良好的来源的**答案 *。

我尝试使用从Azure Blob存储生成的 Dataframe 的结果，并应用到下一步（以某种方式提取数据）。
我已经测试了两方面（从Azure Blob存储生成数据和使用Regex提取数据（如果我单独测试，它可以工作）），但我现在的挑战是将两段代码放在一起。
以下是第一部分（从Azure Blob存储获取 Dataframe ）：

import re 
from io import StringIO
import pandas as pd
from azure.storage.blob import BlobClient

blob = BlobClient(account_url="https://test.blob.core.windows.net",
              container_name="xxxx",
              blob_name="Text.csv",
              credential="xxxx")

data = blob.download_blob()
df = pd.read_csv(data)

字符串
下面是第二部分（仅从CSV文件中提取部分内容）：

def read_block(names, igidx=True):
    with open("Test.csv") as f:   ###<<<This is where I would like to modify<<<###              
        pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
        return pd.concat([
            pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
                .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                    pat, f.read(), flags=re.M|re.S) if m.group(1) in names # optional
        ], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"],igidx=False).droplevel(1).reset_index(names="Admission")

型
所以，我尝试做的是从第一个代码中使用df，并将其应用到第二个代码的输入部分，其中显示“with open（“Test.csv”）as f“。
我如何修改这段代码的第二部分来获取第一部分的数据结果？
x1c 0d1x的数据
或者，如果这不起作用，有没有一种方法可以像下面这样使用从Azure生成的文件路径ID（数据）？

<azure.storage.blob._download.StorageStreamDownloader object at 0x00000xxxxxxx>

型

最新消息：

我修改了下面的代码，现在我得到了concat错误：
我不确定这是由于没有任何循环函数（因为我修改删除了“with open（“Test.csv”）as f：）。

...

data = blob.download_blob()
df = pd.read_csv(data)
df1 = df.to_csv(index=False, header=False)

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, df1, flags=re.M|re.S) if m.group(1) in names 
    ], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"], igidx=False).droplevel(1).reset_index(names="Admission")   
print(df2)

型

的

新图片：

的

这是df 1：

Not Started: 12,Sent: 3,Completed: 3,,,
,,,,,
Division,Community,Resident Name,Date,Document Status,Last Update
,Test Station,Jane Doe ,9/12/2023,Sent,9/12/2023
,Test Station 2,John Doe,9/12/2023,Not Started,
,Alibaba Fizgerald,Super Man,9/12/2023,Not Started,
,Iceland Kingdom,Super Woman,9/12/2023,Not Started,
,,,,,
,,,,,
Readmissions,,,,,
Not Started: 1,Sent: 0,Completed: 1,,,
,,,,,
Division,Community,Resident Name,Date,Document Status,Last Update
,Station Kingdom,Pretty Woman ,9/12/2023,Not Started,
,My Goodness,Ugly Man,7/21/2023,Completed,7/26/2023
,,,,,
,,,,,
Discharge,,,,,
,,,,,
Division,Community,Resident Name,Date,,
,Station Kingdom1 ,Pretty Woman2 ,8/22/2023,,
,My Goodness1 ,Ugly Man1,4/8/2023,,
,Landmark2,Nice Guys,9/12/2023,,
,Iceland Kingdom2,Mr. Heroshi2,7/14/2023,,
,More Kingdom 2,King Kong ,8/31/2023,,

型
这是图像csv文件（数据生成为df 1）：

的

**这是最新的错误信息：**x1c4d 1x
这是我的最新代码（11/13/2023-1）：

import re 
from io import StringIO
import pandas as pd
from azure.storage.blob import BlobClient
blob = 
BlobClient(account_url="https://xxxx.blob.core.windows.net",
              container_name="xxxx",
              blob_name="SampleSafe.csv",               
              credential="xxxx")

data = blob.download_blob(); 
df = pd.read_csv(data); 
df1 = df.to_csv(index=False)

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, data.readall(), flags=re.M|re.S)
               if m.group(1) in names], keys=names, ignore_index=igidx)

df2 = read_block(names=["Admissions", "Readmissions"], igidx=False).droplevel(1).reset_index(names="block")
print(df2)

型
这是详细的错误消息（更新于11/13/2023-1）：

pandas

来源：https://stackoverflow.com/questions/77449311/python-how-to-insert-result-of-data-frame-into-read-block-function

1条答案

按热度按时间

zpjtge221#

我修改了代码.现在我得到concat错误 *！

IIUC，这是因为正则表达式无法匹配所需的块。在创建 bufferdf1 = df.to_csv(index=False)时，需要删除header=False。或者简单地将readall下载的blob并创建一个 string，以避免阅读输入csv作为DataFrame：

data = blob.download_blob(max_concurrency=1, encoding="UTF-8")

def read_block(names, igidx=True):    
    pat = r"(\w+),+$\n[^,]+.+?\n,+\n(.+?)(?=\n,{2,})"
    return pd.concat([
        pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
            .iloc[:, 1:].dropna(how="all") for m in re.finditer(
                pat, data.readall(), flags=re.M|re.S)
               if m.group(1) in names], keys=names, ignore_index=igidx)

字符串

赞(0）回复(0）举报 2023-11-15

我来回答

pandas python如何在read_block函数中插入 Dataframe 的结果

1条答案

相关问题

热门标签

最新问答