我想从Kaggle
创建一个基于此CSV
的arff
文件
https://www.kaggle.com/c/titanic/download/train.csv
下面是我创建的arff
文件的一部分
@relation titanic
@attribute PassengerId numeric
@attribute Survived {0,1}
@attribute Pclass {1,2,3}
@attribute Name string
@attribute Sex {male,female}
@attribute Age numeric
@attribute SibSp numeric
@attribute Parch numeric
@attribute Ticket string
@attribute Fare numeric
@attribute Cabin string
@attribute Embarked {C,Q,S}
@data
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
但是当我在Weka
中加载它时,它返回以下错误:
nominal value not declared in header, read Token[C85], line 18 % the second line of my data
我的声明有什么不对吗?
1条答案
按热度按时间sirbozc51#
问题是名称
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
中有一个逗号。Weka将其解析为两个字段,尽管有双引号。您可以尝试在正则表达式的帮助下删除此类逗号(即双引号中的逗号)。