我正在尝试在本地设置spark history配置服务器。我正在使用windows和pycharm进行pyspark编程。我可以在localhost:4040. 我所做的是:
spark-defaults.conf:(在这里我添加了最后三行。)
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.jars.packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1
spark.eventLog.enabled true
spark.history.fs.logDirectory file:///D:///tmp///spark-events
运行历史服务器
C:\Users\hp\spark>bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/08/09 08:58:04 INFO HistoryServer: Started daemon with process name: 13476@DESKTOP-B9KRC6O
20/08/09 08:58:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/09 08:58:23 INFO SecurityManager: Changing view acls to: hp
20/08/09 08:58:23 INFO SecurityManager: Changing modify acls to: hp
20/08/09 08:58:23 INFO SecurityManager: Changing view acls groups to:
20/08/09 08:58:23 INFO SecurityManager: Changing modify acls groups to:
20/08/09 08:58:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hp); groups with view permissions: Set(); users with modify permissions: Set(hp); groups with modify permissions: Set()
20/08/09 08:58:24 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
20/08/09 08:58:26 INFO Utils: Successfully started service on port 18080.
20/08/09 08:58:26 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://DESKTOP-B9KRC6O:18080
成功运行pyspark程序后,我无法在spark history server web ui上查看作业详细信息。尽管服务器已启动。如下所示:
我已经使用过的参考资料:
windows:apache spark历史服务器配置
如何在windows上运行spark历史服务器
我使用的代码如下:
from pyspark import SparkContext,SparkConf
from pyspark.sql import SparkSession
conf = SparkConf().setAppName("madhu").setMaster("local")
sc = SparkContext(conf=conf)
spark = SparkSession(sc).builder.getOrCreate()
def readtable(dbname,table):
dbname = dbname
table=table
hostname = "localhost"
jdbcPort = 3306
username = "root"
password = "madhu"
jdbc_url = "jdbc:mysql://{0}:{1}/{2}?user={3}&password={4}".format(hostname,jdbcPort, dbname,username,password)
dataframe = spark.read.format('jdbc').options(driver = 'com.mysql.jdbc.Driver',url=jdbc_url, dbtable=table).load()
return dataframe
t1 = readtable("db","table1")
t2 = readtable("db2","table2")
print(t2.show())
spark.stop()
请帮助我如何实现同样的目标。我将提供所需的任何数据。
我还尝试了以下目录路径:
spark.eventLog.enabled true
spark.history.fs.logDirectory file:///D:/tmp/spark-events
1条答案
按热度按时间piztneat1#
您必须在应用程序中提供正确的主url,并使用spark submit运行应用程序。
你可以在spark的ui中找到它
localhost:4040
在下面的示例中,主url是spark://XXXX:7077
.你的申请应该是: