在朴素贝叶斯算法之前可以应用pca输出特性吗

pjngdqdw  于 2021-05-29  发布在  Spark
关注(0)|答案(0)|浏览(264)

我是apachespark的新手,我想知道在apachespark中,在应用naivebayes算法训练模型之前,是否可以对数据集应用pca?
现在,下面是我的代码:

Dataset<Row>[] splits = data.randomSplit(new double[] {0.7, 0.3}, 54321);
        Dataset<Row> trainingData = splits[0];
        Dataset<Row> testingData = splits[1];

        /* I want the 30 output features of the following line*/
        PCAModel pca = new PCA().setInputCol("features").setOutputCol("pcaFeatures").setK(30).fit(data);

        NaiveBayes bayes = new NaiveBayes();
        /* To become the input for this classifier*/
        NaiveBayesModel model = bayes.fit();
        bayes.setLabelCol("label").setFeaturesCol("pcaFeatures");

        Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] {model});

        PipelineModel modelx = pipeline.fit(trainingData);

        Dataset<Row> predTraining = modelx.transform(trainingData);
        Dataset<Row> predTest = modelx.transform(testingData);

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题