如何将mlib库添加到spark？

6mw9ycah 于 2021-05-16 发布在 Spark

关注(0)|答案(1)|浏览(508)

我被指派使用apache spark使用python语言运行一些代码并显示结果，我使用以下步骤安装了apache spark服务器：https://phoenixnap.com/kb/install-spark-on-windows-10. 我尝试了我的代码，一切都很好。现在我被分配了另一个任务，它需要mllib线性回归，他们为我们提供了一些应该运行的代码，然后我们将为它添加额外的代码。当我试图运行代码时，我有一些错误和警告，其中一部分出现在上一个赋值中，但它仍然工作。我认为问题在于，应该添加一些与mlib库相关的东西，这样代码才能正确运行。有人知道应该向spark添加哪些文件以便它运行与mlib相关的代码吗？我使用的是windows10和spark-3.0.1-bin-hadoop2.7
这是我的密码：

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import StandardScaler

conf = SparkConf().setMaster("local").setAppName("LinearRegression")
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)

# Load training data

df = sqlContext.read.format("libsvm").option("numFeatures", 13).load("boston_housing.txt")

# Data needs to be scaled for better results and interpretation

# Initialize the `standardScaler`

standardScaler = StandardScaler(inputCol="features", outputCol="features_scaled")

# Fit the DataFrame to the scaler

scaler = standardScaler.fit(df)

# Transform the data in `df` with the scaler

scaled_df = scaler.transform(df)

# Initialize the linear regression model

lr = LinearRegression(labelCol="label", maxIter=10, regParam=0.3, elasticNetParam=0.8)

# Fit the data to the model

linearModel = lr.fit(scaled_df)

# Print the coefficients for the model

print("Coefficients: %s" % str(linearModel.coefficients))
print("Intercept: %s" % str(linearModel.intercept))

以下是我运行代码时的屏幕截图：