下面是我如何使用pandas在python中加载ALOI.arff,它的格式如下
outlier,att1,att2,att3,att4,att5,att6,att7,att8,att9,att10,att11,att12,att13,att14,att15,att16,att17,att18,att19,att20,att21,att22,att23,att24,att25,att26,att27,id
'yes',0.8728117766203703,4.521122685185185E-6,0.0,3.616898148148148E-5,0.0,0.0,0.0,0.0,0.0,0.05032687717013889,4.521122685185185E-6,0.0,0.005631058304398148,0.004163953993055556,0.0,2.2605613425925925E-6,2.0345052083333332E-5,0.0,0.01421214916087963,1.0398582175925926E-4,0.0,0.025490089699074073,0.004937065972222222,1.1302806712962962E-5,5.425347222222222E-5,0.006804289641203704,0.015385380497685185,1.0
'yes',0.9752061631944444,0.0,0.0,6.510416666666666E-4,0.0,0.0,0.0,0.0,0.0,0.007039388020833333,0.0,0.0,0.009996202256944444,4.7019675925925923E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004853425202546296,0.001582392939814815,0.0,0.0,2.0118995949074074E-4,0.0,2.0
'yes',0.9637767650462963,0.0,0.0,0.00200511791087963,0.0,0.0,0.0,0.0,0.0,0.006641529224537037,2.2605613425925925E-6,0.0,0.012351707175925927,6.7138671875E-4,0.0,0.0,6.781684027777777E-6,0.0,0.0,0.0,0.0,0.007828323929398149,0.0025227864583333335,0.0,3.933376736111111E-4,0.003800003616898148,0.0,3.0
我试着加载像
data = pd.read_csv('ALOI.arff')
但我想完全相同的加载,但我的ALOI.arff文件不是这种格式
@RELATION 'ALOI'
@ATTRIBUTE 'outlier' {'yes','no'}
@ATTRIBUTE 'att1' real
@ATTRIBUTE 'att2' real
@ATTRIBUTE 'att3' real
@ATTRIBUTE 'att4' real
@ATTRIBUTE 'att5' real
@ATTRIBUTE 'att6' real
@ATTRIBUTE 'att7' real
@ATTRIBUTE 'att8' real
@ATTRIBUTE 'att9' real
@ATTRIBUTE 'att10' real
@ATTRIBUTE 'att11' real
@ATTRIBUTE 'att12' real
@ATTRIBUTE 'att13' real
@ATTRIBUTE 'att14' real
@ATTRIBUTE 'att15' real
@ATTRIBUTE 'att16' real
@ATTRIBUTE 'att17' real
@ATTRIBUTE 'att18' real
@ATTRIBUTE 'att19' real
@ATTRIBUTE 'att20' real
@ATTRIBUTE 'att21' real
@ATTRIBUTE 'att22' real
@ATTRIBUTE 'att23' real
@ATTRIBUTE 'att24' real
@ATTRIBUTE 'att25' real
@ATTRIBUTE 'att26' real
@ATTRIBUTE 'att27' real
@ATTRIBUTE 'id' real
@DATA
'yes',0.8728117766203703,4.521122685185185E-6,0.0,3.616898148148148E-5,0.0,0.0,0.0,0.0,0.0,0.05032687717013889,4.521122685185185E-6,0.0,0.005631058304398148,0.004163953993055556,0.0,2.2605613425925925E-6,2.0345052083333332E-5,0.0,0.01421214916087963,1.0398582175925926E-4,0.0,0.025490089699074073,0.004937065972222222,1.1302806712962962E-5,5.425347222222222E-5,0.006804289641203704,0.015385380497685185,1.0
'yes',0.9752061631944444,0.0,0.0,6.510416666666666E-4,0.0,0.0,0.0,0.0,0.0,0.007039388020833333,0.0,0.0,0.009996202256944444,4.7019675925925923E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004853425202546296,0.001582392939814815,0.0,0.0,2.0118995949074074E-4,0.0,2.0
'yes',0.9637767650462963,0.0,0.0,0.00200511791087963,0.0,0.0,0.0,0.0,0.0,0.006641529224537037,2.2605613425925925E-6,0.0,0.012351707175925927,6.7138671875E-4,0.0,0.0,6.781684027777777E-6,0.0,0.0,0.0,0.0,0.007828323929398149,0.0025227864583333335,0.0,3.933376736111111E-4,0.003800003616898148,0.0,3.0
'yes',0.9732462565104166,0.0,0.0,5.560980902777778E-4,0.0,0.0,0.0,0.0,0.0,0.008978949652777778,2.2605613425925925E-6,0.0,0.012433087384259259,2.147533275462963E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004392270688657407,1.6954210069444444E-4,0.0,0.0,6.781684027777777E-6,0.0,4.0
'yes',0.9607204861111112,0.0,0.0,6.555627893518518E-4,0.0,0.0,0.0,0.0,0.0,0.013319227430555556,0.0,0.0,0.01389114945023148,2.0571108217592592E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010299117476851851,5.60619212962963E-4,0.0,8.364076967592593E-5,2.644856770833333E-4,0.0,5.0
如何在python中做到这一点?我是非常新的这一点。你能发现任何问题,我试图加载它一样,但没有'nt工作,并得到错误的
File "D:\projects\work\machineLearning\main.py", line 8, in <module>
df = pd.DataFrame(data['data'], columns=[attr[0] for attr in data['attributes']])
~~~~^^^^^^^^
TypeError: 'generator' object is not subscriptable
我的验证码:
import arff
with open('ALOI.arff') as f:
data = arff.load(f)
data = pd.DataFrame(data['data'], columns=[attr[0] for attr in data['attributes']])
1条答案
按热度按时间zbq4xfa01#
应用
scipy.io.arff.loadarff
读取arff文件: