我是apachespark的新手,我想知道在apachespark中,在应用naivebayes算法训练模型之前,是否可以对数据集应用pca?
现在,下面是我的代码:
Dataset<Row>[] splits = data.randomSplit(new double[] {0.7, 0.3}, 54321);
Dataset<Row> trainingData = splits[0];
Dataset<Row> testingData = splits[1];
/* I want the 30 output features of the following line*/
PCAModel pca = new PCA().setInputCol("features").setOutputCol("pcaFeatures").setK(30).fit(data);
NaiveBayes bayes = new NaiveBayes();
/* To become the input for this classifier*/
NaiveBayesModel model = bayes.fit();
bayes.setLabelCol("label").setFeaturesCol("pcaFeatures");
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] {model});
PipelineModel modelx = pipeline.fit(trainingData);
Dataset<Row> predTraining = modelx.transform(trainingData);
Dataset<Row> predTest = modelx.transform(testingData);
暂无答案!
目前还没有任何答案,快来回答吧!