如何读取多个文本文件并将其单独保存为Pandas Dataframe?

tquggr8v  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(145)

我有多个txt文件,我想通过使用标题创建一个新列来将它们转换为 Dataframe 。我的数据如下所示:

Person:?,?;F dob. ?  MT: ? Z:C NewYork Mon.:S St.?

144 cm/35 Kg/5 YearsOld




45,34,22,26,0
78,74,82,11,0

我使用下面的代码从一个文本文件中创建一个 Dataframe 。

with open('file_directory', 'r') as f:
    heading_rows = [next(f) for _ in range(3)]


city = re.findall(pattern = ' \w+ ', string = heading_rows[0])[0].strip()
numbers_list = [re.findall(pattern='\d+', string=row) for row in heading_rows if 'cm' and 'kg' in row.lower()][0]

height, weight, age = [int(numbers_list[i]) for i in range(3)]
    
df = pd.read_csv('file_directory', sep='\s+|;|,', engine='python', skiprows=8,comment='cm', index_col=None, names=list('ABCDEF'))
df = df.rename(columns = {'A':'SBP','B':'MAP','C':'DBP','D':'HR','E':'HOUR','F':'MINUTE'}) #df.dropna(inplace=True)
df['HEIGHT'] = height
df['WEIGHT'] = weight
df['AGE'] = age
df['CENTER'] = city

我试着把代码(上面)放在一个for循环中,这样我就可以读取文件夹中的所有文本文件,这样我就可以把它们单独转换成一个Pandas Dataframe ,并保存为csv文件。

lst = []
for name in glob.glob('my_directory/*'):

    with open(name, 'r') as f:
        heading_rows = [next(f) for _ in range(1)]
        lst.append(heading_rows)

但是,我在代码的下一个(f)部分结束时出现了StopIteration错误。我如何才能获得下面的 Dataframe ?
我的期望是具有以下 Dataframe 类型:

A, B, C, D, E, height, weight, age, city
45,34,22,26,0, 144,   35,      5,   NewYork 
78,74,82,11,0, 144,   35,      5,   NewYork
eivgtgni

eivgtgni1#

试试看:

import re
import pandas as pd

text = """\
Person:?,?;F dob. ?  MT: ? Z:C NewYork Mon.:S St.?

144 cm/35 Kg/5 YearsOld

45,34,22,26,0
78,74,82,11,0
"""

pat = re.compile(
    r"(?sim)Z:C (\S+).*(\d+)\s*cm\D+(\d+)\s*kg\D+(\d+).*?((?:^[\d,]+\n)+)"
)

m = pat.search(text)
if m:
    city, height, weight, age, data = m.groups()
    all_data = []
    for row in data.splitlines():
        all_data.append(
            list(map(int, row.split(","))) + [height, weight, age, city]
        )

df = pd.DataFrame(
    all_data,
    columns=["A", "B", "C", "D", "E", "height", "weight", "age", "city"],
)
print(df)

印刷品:

A   B   C   D  E height weight age     city
0  45  34  22  26  0      4     35   5  NewYork
1  78  74  82  11  0      4     35   5  NewYork

相关问题