我正在Kubernetes上使用spark-submit cli运行Spark 3.0.0和Hadoop 2.7,如下所示:
./spark-submit \
--master=k8s://https://api.k8s.my-domain.com \
--deploy-mode cluster \
--name sparkle \
--num-executors 2 \
--executor-cores 2 \
--executor-memory 2g \
--driver-memory 2g \
--class com.myorg.sparkle.Sparkle \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.kubernetes.submission.waitAppCompletion=false \
--conf spark.kubernetes.allocation.batch.delay=10s \
--conf spark.kubernetes.appKillPodDeletionGracePeriod=20s \
--conf spark.kubernetes.node.selector.workloadType=spark \
--conf spark.kubernetes.driver.pod.name=sparkle-driver \
--conf spark.kubernetes.container.image=custom-registry/spark:latest \
--conf spark.kubernetes.namespace=spark \
--conf spark.eventLog.dir='s3a://my-bucket/spark-logs' \
--conf spark.history.fs.logDirectory='s3a://my-bucket/spark-logs' \
--conf spark.eventLog.enabled='true' \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
--conf spark.kubernetes.driver.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.executor.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com \
--conf spark.hadoop.com.amazonaws.services.s3.enableV4=true \
--conf spark.yarn.maxAppAttempts=4 \
--conf spark.yarn.am.attemptFailuresValidityInterval=1h \
s3a://dp-spark-jobs/sparkle/jars/sparkle.jar \
--commonConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/prod_main_configs.yaml \
--jobConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/cc_configs.yaml \
--filePathDate 2021-03-29 20000
我已经托管了一个不同的pod,运行具有相同映像的历史服务器。历史记录服务器能够读取所有事件日志并显示详细信息。作业已成功执行。
我在历史记录服务器中没有看到驱动程序或执行器日志。此外,没有可用的阶段日志。我正在通过log4j.properties,内容如下,
# Define the root logger with appender file
log4j.rootLogger = INFO, console
# Redirect log messages to console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# Silence akka remoting
log4j.logger.Remoting=ERROR
log4j.logger.akka.event.slf4j=ERROR
log4j.logger.org.spark-project.jetty.server=ERROR
# log4j.logger.org.apache.spark=ERROR
log4j.logger.org.apache.spark.deploy=INFO
log4j.logger.com.anjuke.dm=${dm.logging.level}
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.org.apache.hive=ERROR
log4j.logger.org.apache.spark.sql.hive=ERROR
log4j.logger.org.apache.hadoop.hive=ERROR
log4j.logger.org.datanucleus=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
# Silence Spark Streaming
log4j.logger.org.apache.spark.sql.execution.streaming.state=ERROR
log4j.logger.org.apache.spark.SparkContext=ERROR
log4j.logger.org.apache.spark.executor=ERROR
log4j.logger.org.apache.spark.storage=ERROR
log4j.logger.org.apache.spark.scheduler=ERROR
log4j.logger.org.apache.spark.ContextCleaner=ERROR
log4j.logger.org.apache.spark.sql=ERROR
log4j.logger.org.apache.parquet.hadoop=ERROR
log4j.logger.org.apache.spark.sql.kafka010.KafkaSource=ERROR
log4j.additivity.org.apache.spark=false
log4j.additivity.org.apache.hadoop=false
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
# Silence Kafka
log4j.logger.org.apache.kafka=ERROR, stdout
log4j.additivity.org.apache.kafka=false
log4j.logger.org.apache.kafka.clients.consumer=ERROR
log4j.logger.org.apache.kafka.clients.consumer.internals.Fetcher=ERROR
历史记录服务器在Executor选项卡中未显示日志列。对于stages部分,logs列为空。似乎只有事件被记录,没有stdout,stderr被从任何pod捕获
如何确保驱动程序和执行器stdout、stderr记录到S3并在历史服务器中可用?
1条答案
按热度按时间gcmastyq1#
为了从Spark驱动程序和执行器pod中捕获标准输出(stdout)和标准错误(stderr)日志,并将其提供给Spark历史服务器,您可以按照以下步骤操作:
1.日志聚合:确保为Kubernetes集群设置了日志聚合。有几种解决方案可用于此,例如Fluentd,Fluent Bit,Logstash等,可以配置为从Pod中聚合日志并将其存储在S3中。
1.Log4j配置:修改log4j配置以记录stdout和stderr。在当前的log4j配置中,日志被定向到控制台(
log4j.appender.console.Target=System.out
)。您需要向根日志记录器(log4j.rootLogger
)添加一个文件附加器,以便将日志写入文件。1.Spark配置:使用Spark配置参数
spark.executor.logs.rolling.strategy
指定一个滚动日志策略,当日志文件达到一定的大小或年龄时,该策略将定期滚动日志文件。然后,日志聚合系统可以接收滚动的日志文件。1.卷挂载:确保您的Spark驱动程序和执行器pod具有必要的卷装载,以便将日志写入日志聚合系统可以访问的位置。这可能涉及到为您的Pod配置持久卷(PV)和持久卷声明(PVC)。
1.历史服务器配置:最后,配置Spark历史服务器,从S3中的聚合日志中读取。这可能涉及到设置
spark.history.fs.logDirectory
配置参数以指向您的S3存储桶。下面是如何修改log4j配置的示例:
请注意,您需要将
/path/to/your/log/file.log
替换为您要写入日志文件的实际路径。日志聚合系统应该可以访问此路径。此外,需要注意的是,这种设置可能会引入额外的延迟和I/O开销,因为日志被写入磁盘,然后由日志聚合系统接收。这可能会影响Spark作业的性能,因此建议仔细监控系统并根据需要调整配置,以平衡日志记录要求和性能。