spark并行集合

ux6nzvsh  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(477)

我对spark非常陌生,无法运行并行集合,这是我的代码:

from pyspark import SparkContext as sc

words = [
    'Apache', 'Spark', 'is', 'an', 'open-source', 'cluster-computing',
    'framework', 'Apache', 'Spark', 'open-source', 'Spark'
]

# Creates a RDD from a list of words

distributed_words = sc.parallelize(words)
distributed_words.count()

我得到:

TypeError: parallelize() missing 1 required positional argument: 'c'
why?
eivgtgni

eivgtgni1#

你需要初始化 spark Context 我们可以从 Spark Session 从开始 Spark-2 那么 parallelize 文字收藏。 Example: ```
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("test").master("local").getOrCreate()
sc=spark.sparkContext
words = [
'Apache', 'Spark', 'is', 'an', 'open-source', 'cluster-computing',
'framework', 'Apache', 'Spark', 'open-source', 'Spark'
]
distributed_words = sc.parallelize(words)
distributed_words.count()

11

相关问题