PyML元数据字典为空

p4rjhz4m 于 2021-06-10 发布在 Cassandra

关注(0)|答案(0)|浏览(265)

我在pyspark中实现了一个文本分类器，如下所示

tokenizer = RegexTokenizer(inputCol="documents", outputCol="tokens", pattern='\\W+')

remover = StopWordsRemover(inputCol='tokens', outputCol='nostops')

vectorizer = CountVectorizer(inputCol='nostops', outputCol='features', vocabSize=1000)

labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel", handleInvalid='skip')
labelIndexer_model = labelIndexer.fit(countModel_df)

convertor = IndexToString(inputCol='prediction', outputCol='predictedLabel', labels=labelIndexer_model.labels)

rfc = RandomForestClassifier(featuresCol='features', labelCol='indexedLabel', numTrees=30)

evaluator = BinaryClassificationEvaluator(labelCol='indexedLabel', rawPredictionCol='prediction')

pipe_rfc = Pipeline(stages=[tokenizer, remover, labelIndexer, vectorizer, rfc, convertor])

train_df, test_df = df.randomSplit((0.8, 0.2), seed=42)

model = pipe_rfc.fit(train_df)

prediction_rfc_df = rfc_model.transform(test_df)

代码正在工作，预测函数按预期进行预测。但是当我想检查元数据时，元数据字典是空的，如下所示

prediction_rfc_df.schema['features'].metadata

Output : {}

prediction_rfc_df.schema['label'].metadata

Output: {}

你知道为什么数据框中缺少元数据吗？
我从Cassandra表格中读取数据如下：

df = spark.read \
     .format("org.apache.spark.sql.cassandra") \
     .options(table='table_name', keyspace='key_space_name') \
     .load()

cassandra apache-spark pyspark python-3.x apache-spark-ml

来源：https://stackoverflow.com/questions/54642353/pyspark-spark-ml-metadata-dictionary-is-empty

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

PyML元数据字典为空

暂无答案！

相关问题

热门标签

最新问答