将嵌套数组转移到CSV文件

cigdeys3  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(122)


我目前正在使用.mat格式的数据集。然而,我遇到了一个挑战,因为数据集包含嵌套数组,我需要使用CSV格式的数据。
我正在寻求有关将此嵌套.mat数据集转换为CSV格式的最有效方法的指导。您在此方面的专业知识将不胜感激。我的数据集链接:https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac/files/m5ac36a1e2073852e4f1f7dee647909a7

import numpy as np
import pandas as pd
import scipy.io as sio
mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')
mat

字符串
我的输出

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 05 11:16:25 2017',
 '__version__': '1.0',
 '__globals__': [],
 'Cell1': array([[(array([[(array([[(array([[735954.85896553],
                                [735954.8589771 ],
                                [735954.85898867],
                                ...,
                                [735954.8995558 ],


dtype=[('t', 'O'), ('v', 'O'), ('q', 'O'), ('T', 'O')]))                                   ]],
               dtype=[('C1ch', 'O'), ('C1dc', 'O'), ('OCVch', 'O'), ('OCVdc', 'O')]))                                            ]],
       dtype=[('cyc0000', 'O'), ('cyc0100', 'O'), ('cyc0300', 'O'), ('cyc0400', 'O'), ('cyc0500', 'O'), ('cyc0600', 'O'), ('cyc0700', 'O'), ('cyc0800', 'O'), ('cyc0900', 'O'), ('cyc1000', 'O'), ('cyc1100', 'O'), ('cyc1200', 'O'), ('cyc1300', 'O'), ('cyc1400', 'O'), ('cyc1600', 'O'), ('cyc1800', 'O'), ('cyc1900', 'O'), ('cyc2000', 'O'), ('cyc2100', 'O'), ('cyc2200', 'O'), ('cyc2300', 'O'), ('cyc2400', 'O'), ('cyc2500', 'O'), ('cyc2600', 'O'), ('cyc2700', 'O'), ('cyc2800', 'O'), ('cyc2900', 'O'), ('cyc3000', 'O'), ('cyc3100', 'O'), ('cyc3200', 'O'), ('cyc3300', 'O'), ('cyc3500', 'O'), ('cyc3600', 'O'), ('cyc3700', 'O'), ('cyc3800', 'O'), ('cyc3900', 'O'), ('cyc4000', 'O'), ('cyc4100', 'O'), ('cyc4200', 'O'), ('cyc4300', 'O'), ('cyc4400', 'O'), ('cyc4500', 'O'), ('cyc4600', 'O'), ('cyc4800', 'O'), ('cyc5000', 'O'), ('cyc5100', 'O'), ('cyc5200', 'O'), ('cyc5300', 'O'), ('cyc5400', 'O'), ('cyc5500', 'O'), ('cyc5600', 'O'), ('cyc5700', 'O'), ('cyc5800', 'O'), ('cyc5900', 'O'), ('cyc6000', 'O'), ('cyc6100', 'O'), ('cyc6200', 'O'), ('cyc6300', 'O'), ('cyc6400', 'O'), ('cyc6500', 'O'), ('cyc6600', 'O'), ('cyc6700', 'O'), ('cyc6800', 'O'), ('cyc6900', 'O'), ('cyc7000', 'O'), ('cyc7100', 'O'), ('cyc7200', 'O'), ('cyc7300', 'O'), ('cyc7400', 'O'), ('cyc7500', 'O'), ('cyc7600', 'O'), ('cyc7700', 'O'), ('cyc7800', 'O'), ('cyc7900', 'O'), ('cyc8000', 'O'), ('cyc8100', 'O')])}


实际上,我应该有八个这种格式的数据集,其中列与数组中的't ',' v ',' q '和'T'相关联。有一个示例表示一个单元格数据集的预期结果:

cell8= pd.DataFrame(columns=['Time','Voltage','Capacity','Temperature'])
cell8

tcomlyy6

tcomlyy61#

我不确定你是否意识到你这里的数据量。我有代码可以提取数据,但这里有超过6100万个数据项。打印为CSV文件,大约有2.5千兆字节。

import numpy as np
import scipy.io as sio
mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')

def dive(names,cell):
    global lines
    if len(cell) > 1000:
        for n in cell:
            print(','.join(names+[str(n[0])]))
    elif len(cell) > 1:
        for n,c in zip(cell.dtype.fields, cell):
            dive(names+[n], c)
    else:
        dive(names,cell[0])

for cno in range(8):
    name = f'Cell{cno+1}'
    cell = mat[name]
    dive([name],mat[name])

字符串
这个文件的开头看起来像:

Cell1,cyc0000,C1ch,t,735954.8589655256
Cell1,cyc0000,C1ch,t,735954.8589770996
Cell1,cyc0000,C1ch,t,735954.8589886738
Cell1,cyc0000,C1ch,t,735954.8590002478
Cell1,cyc0000,C1ch,t,735954.8590118219
Cell1,cyc0000,C1ch,t,735954.859023396
Cell1,cyc0000,C1ch,t,735954.85903497
Cell1,cyc0000,C1ch,t,735954.8590465442
Cell1,cyc0000,C1ch,t,735954.8590581182
Cell1,cyc0000,C1ch,t,735954.8590696923
Cell1,cyc0000,C1ch,t,735954.8590812663
Cell1,cyc0000,C1ch,t,735954.8590928405
Cell1,cyc0000,C1ch,t,735954.8591044145
Cell1,cyc0000,C1ch,t,735954.8591159886
Cell1,cyc0000,C1ch,t,735954.8591275626
Cell1,cyc0000,C1ch,t,735954.8591391367
Cell1,cyc0000,C1ch,t,735954.8591507107
Cell1,cyc0000,C1ch,t,735954.8591622849
Cell1,cyc0000,C1ch,t,735954.8591738589
Cell1,cyc0000,C1ch,t,735954.859185433
Cell1,cyc0000,C1ch,t,735954.8591970071
Cell1,cyc0000,C1ch,t,735954.8592085812
Cell1,cyc0000,C1ch,t,735954.8592201553
Cell1,cyc0000,C1ch,t,735954.8592317293
Cell1,cyc0000,C1ch,t,735954.8592433034
Cell1,cyc0000,C1ch,t,735954.8592548774
Cell1,cyc0000,C1ch,t,735954.8592664516
Cell1,cyc0000,C1ch,t,735954.8592780256
Cell1,cyc0000,C1ch,t,735954.8592895997
Cell1,cyc0000,C1ch,t,735954.8593011737
Cell1,cyc0000,C1ch,t,735954.8593127478
Cell1,cyc0000,C1ch,t,735954.8593243218
Cell1,cyc0000,C1ch,t,735954.859335896
Cell1,cyc0000,C1ch,t,735954.8593474701
Cell1,cyc0000,C1ch,t,735954.8593590441
Cell1,cyc0000,C1ch,t,735954.8593706182
Cell1,cyc0000,C1ch,t,735954.8593821923
...


第一列运行单元1到单元8。第二列具有70到80个条目,cyc0000cyc0100等。第三列具有4个条目,C1chC1dcOCVchOCVdc。第四列具有4个条目,tvqT。您不能遍历这些数字,因为最后一个维度的大小变化很大,从2,500到10,000个条目。

后续

下面是将mat文件转换成一组嵌套字典的代码。你可以在最后一行看到如何访问它。也许这对你的目的有用。

import numpy as np
import scipy.io as sio

mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')

def dive(cell):
    if len(cell) > 1000:
        print(len(cell))
        return cell
    elif len(cell) > 1:
        data = {}
        for n,c in zip(cell.dtype.fields, cell):
            data[n] = dive(c)
        return data
    else:
        return dive(cell[0])

master = {}

for cno in range(8):
    name = f'Cell{cno+1}'
    master[name] = dive(mat[name])

print(master['Cell3']['cyc2100']['OCVch']['v'])

相关问题