csv 如何使用pandas从arff文件数据集加载到python?

twh00eeo  于 2023-04-03  发布在  Python
关注(0)|答案(1)|浏览(199)

下面是我如何使用pandas在python中加载ALOI.arff,它的格式如下

outlier,att1,att2,att3,att4,att5,att6,att7,att8,att9,att10,att11,att12,att13,att14,att15,att16,att17,att18,att19,att20,att21,att22,att23,att24,att25,att26,att27,id 
'yes',0.8728117766203703,4.521122685185185E-6,0.0,3.616898148148148E-5,0.0,0.0,0.0,0.0,0.0,0.05032687717013889,4.521122685185185E-6,0.0,0.005631058304398148,0.004163953993055556,0.0,2.2605613425925925E-6,2.0345052083333332E-5,0.0,0.01421214916087963,1.0398582175925926E-4,0.0,0.025490089699074073,0.004937065972222222,1.1302806712962962E-5,5.425347222222222E-5,0.006804289641203704,0.015385380497685185,1.0
'yes',0.9752061631944444,0.0,0.0,6.510416666666666E-4,0.0,0.0,0.0,0.0,0.0,0.007039388020833333,0.0,0.0,0.009996202256944444,4.7019675925925923E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004853425202546296,0.001582392939814815,0.0,0.0,2.0118995949074074E-4,0.0,2.0
'yes',0.9637767650462963,0.0,0.0,0.00200511791087963,0.0,0.0,0.0,0.0,0.0,0.006641529224537037,2.2605613425925925E-6,0.0,0.012351707175925927,6.7138671875E-4,0.0,0.0,6.781684027777777E-6,0.0,0.0,0.0,0.0,0.007828323929398149,0.0025227864583333335,0.0,3.933376736111111E-4,0.003800003616898148,0.0,3.0

我试着加载像

data = pd.read_csv('ALOI.arff')

但我想完全相同的加载,但我的ALOI.arff文件不是这种格式

@RELATION 'ALOI'

@ATTRIBUTE 'outlier' {'yes','no'}
@ATTRIBUTE 'att1' real
@ATTRIBUTE 'att2' real
@ATTRIBUTE 'att3' real
@ATTRIBUTE 'att4' real
@ATTRIBUTE 'att5' real
@ATTRIBUTE 'att6' real
@ATTRIBUTE 'att7' real
@ATTRIBUTE 'att8' real
@ATTRIBUTE 'att9' real
@ATTRIBUTE 'att10' real
@ATTRIBUTE 'att11' real
@ATTRIBUTE 'att12' real
@ATTRIBUTE 'att13' real
@ATTRIBUTE 'att14' real
@ATTRIBUTE 'att15' real
@ATTRIBUTE 'att16' real
@ATTRIBUTE 'att17' real
@ATTRIBUTE 'att18' real
@ATTRIBUTE 'att19' real
@ATTRIBUTE 'att20' real
@ATTRIBUTE 'att21' real
@ATTRIBUTE 'att22' real
@ATTRIBUTE 'att23' real
@ATTRIBUTE 'att24' real
@ATTRIBUTE 'att25' real
@ATTRIBUTE 'att26' real
@ATTRIBUTE 'att27' real
@ATTRIBUTE 'id' real

@DATA
'yes',0.8728117766203703,4.521122685185185E-6,0.0,3.616898148148148E-5,0.0,0.0,0.0,0.0,0.0,0.05032687717013889,4.521122685185185E-6,0.0,0.005631058304398148,0.004163953993055556,0.0,2.2605613425925925E-6,2.0345052083333332E-5,0.0,0.01421214916087963,1.0398582175925926E-4,0.0,0.025490089699074073,0.004937065972222222,1.1302806712962962E-5,5.425347222222222E-5,0.006804289641203704,0.015385380497685185,1.0
'yes',0.9752061631944444,0.0,0.0,6.510416666666666E-4,0.0,0.0,0.0,0.0,0.0,0.007039388020833333,0.0,0.0,0.009996202256944444,4.7019675925925923E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004853425202546296,0.001582392939814815,0.0,0.0,2.0118995949074074E-4,0.0,2.0
'yes',0.9637767650462963,0.0,0.0,0.00200511791087963,0.0,0.0,0.0,0.0,0.0,0.006641529224537037,2.2605613425925925E-6,0.0,0.012351707175925927,6.7138671875E-4,0.0,0.0,6.781684027777777E-6,0.0,0.0,0.0,0.0,0.007828323929398149,0.0025227864583333335,0.0,3.933376736111111E-4,0.003800003616898148,0.0,3.0
'yes',0.9732462565104166,0.0,0.0,5.560980902777778E-4,0.0,0.0,0.0,0.0,0.0,0.008978949652777778,2.2605613425925925E-6,0.0,0.012433087384259259,2.147533275462963E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004392270688657407,1.6954210069444444E-4,0.0,0.0,6.781684027777777E-6,0.0,4.0
'yes',0.9607204861111112,0.0,0.0,6.555627893518518E-4,0.0,0.0,0.0,0.0,0.0,0.013319227430555556,0.0,0.0,0.01389114945023148,2.0571108217592592E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010299117476851851,5.60619212962963E-4,0.0,8.364076967592593E-5,2.644856770833333E-4,0.0,5.0

如何在python中做到这一点?我是非常新的这一点。你能发现任何问题,我试图加载它一样,但没有'nt工作,并得到错误的

File "D:\projects\work\machineLearning\main.py", line 8, in <module>
    df = pd.DataFrame(data['data'], columns=[attr[0] for attr in data['attributes']])
                      ~~~~^^^^^^^^
TypeError: 'generator' object is not subscriptable

我的验证码:

import arff
with open('ALOI.arff') as f:
    data = arff.load(f)

data = pd.DataFrame(data['data'], columns=[attr[0] for attr in data['attributes']])
zbq4xfa0

zbq4xfa01#

应用scipy.io.arff.loadarff读取arff文件:

from scipy.io import arff

data, meta = arff.loadarff('ALOI.arff')
df = pd.DataFrame(data)
print(df)
outlier      att1      att2  att3      att4  att5  att6  att7  att8  att9  \
0  b'yes'  0.872812  0.000005   0.0  0.000036   0.0   0.0   0.0   0.0   0.0   
1  b'yes'  0.975206  0.000000   0.0  0.000651   0.0   0.0   0.0   0.0   0.0   
2  b'yes'  0.963777  0.000000   0.0  0.002005   0.0   0.0   0.0   0.0   0.0   
3  b'yes'  0.973246  0.000000   0.0  0.000556   0.0   0.0   0.0   0.0   0.0   
4  b'yes'  0.960720  0.000000   0.0  0.000656   0.0   0.0   0.0   0.0   0.0   

      att10     att11  att12     att13     att14  att15     att16     att17  \
0  0.050327  0.000005    0.0  0.005631  0.004164    0.0  0.000002  0.000020   
1  0.007039  0.000000    0.0  0.009996  0.000470    0.0  0.000000  0.000000   
2  0.006642  0.000002    0.0  0.012352  0.000671    0.0  0.000000  0.000007   
3  0.008979  0.000002    0.0  0.012433  0.000215    0.0  0.000000  0.000000   
4  0.013319  0.000000    0.0  0.013891  0.000206    0.0  0.000000  0.000000   

   att18     att19     att20  att21     att22     att23     att24     att25  \
0    0.0  0.014212  0.000104    0.0  0.025490  0.004937  0.000011  0.000054   
1    0.0  0.000000  0.000000    0.0  0.004853  0.001582  0.000000  0.000000   
2    0.0  0.000000  0.000000    0.0  0.007828  0.002523  0.000000  0.000393   
3    0.0  0.000000  0.000000    0.0  0.004392  0.000170  0.000000  0.000000   
4    0.0  0.000000  0.000000    0.0  0.010299  0.000561  0.000000  0.000084   

      att26     att27   id  
0  0.006804  0.015385  1.0  
1  0.000201  0.000000  2.0  
2  0.003800  0.000000  3.0  
3  0.000007  0.000000  4.0  
4  0.000264  0.000000  5.0

相关问题