如何在www.example.com上配置spark 3上的log4jspark-submit.sh

v09wglhw  于 2023-01-17  发布在  Spark
关注(0)|答案(1)|浏览(120)

下面用来生成只需要在Spark2.2版本的日志记录量给我们。然而,在移动到Spark3.3的www.example.com是不再尊重。大量的Spark跟踪和调试信息正在打印。log4j.properties is nolonger respected. a lot of Spark Trace and debug info is being printed.
我听说这是因为spark从log4j转移到了log4j2。最后,尽管谷歌搜索了很多时间,但仍然不清楚如何在spark 3.3的spark提交过程中跨所有驱动程序和执行器配置log4j。
在spark 2.2中运行良好的命令

spark-submit --conf "spark.executor.extraJavaOptions=-Dlog4j.debug=true" --conf "spark.driver.extraJavaOptions=-Dlog4j.debug=true" --files /home/hadoop/log4j.properties --name app  --master yarn --deploy-mode cluster --class a.b.c.Entrypoint /home/hadoop/jars/app.jar
    • 所以问题是**

1.是否有log4j2文件示例?

  1. spark提交命令时如何从主节点传过来?
    1.如何打印log4j调试信息?
  • [编辑1]问题尚未解决!*

基于这些评论,我做了以下修改。但是我看到很多Spark内部数据被记录下来--不仅仅是我的数据

spark-submit --driver-memory 1g --executor-memory 2g  --conf "spark.driver.extraJavaOptions=-Dlog4j2.debug=true --files /home/hadoop/log4j2.properties   --master yarn --deploy-mode cluster  --class com.a.b.ABC /home/hadoop/jars/spark-1.0-SNAPSHOT.jar

log4j2.properties

status=warn
name=campV2

appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yy-MM-dd HH:mm:ss} %p %c: %m%n%ex

rootLogger.level = warn
rootLogger.appenderRef.stdout.ref = console

logger.app.name=com.company1
logger.app.level = debug
logger.app.additivity = false
logger.app.appenderRef.console.ref = console

logger.app2.name=com.company2
logger.app2.level = debug
logger.app2.additivity = false
logger.app2.appenderRef.console.ref = console

使用不需要的数据生成的日志

LogLastModifiedTime:Tue Dec 20 05:52:31 +0000 2022
LogLength:36546
LogContents:
ls -l:
total 20
lrwxrwxrwx 1 yarn yarn   62 Dec 20 05:52 __app__.jar -> /mnt/yarn/usercache/hadoop/filecache/23/spark-1.0-SNAPSHOT.jar
lrwxrwxrwx 1 yarn yarn   58 Dec 20 05:52 __spark_conf__ -> /mnt/yarn/usercache/hadoop/filecache/21/__spark_conf__.zip
lrwxrwxrwx 1 yarn yarn   78 Dec 20 05:52 __spark_libs__ -> /mnt1/yarn/usercache/hadoop/filecache/22/__spark_libs__7763583720024624816.zip
-rw-r--r-- 1 yarn yarn   93 Dec 20 05:52 container_tokens
-rwx------ 1 yarn yarn  646 Dec 20 05:52 default_container_executor.sh

...
...
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/var/log/hadoop-yarn/containers/application_1671425963628_0204/container_1671425963628_0204_01_000003/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/var/log/hadoop-yarn/containers/application_1671425963628_0204/container_1671425963628_0204_01_000003/directory.info"
echo "Launching container"
exec /bin/bash -c "LD_LIBRARY_PATH=\"/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH\" $JAVA_HOME/bin/java -server -Xmx2048m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:OnOutOfMemoryError=kill -9 %p' '-XX:+IgnoreUnrecognizedVMOptions' '--add-opens=java.base/java.lang=ALL-UNNAMED' '--add-opens=java.base/java.lang.invoke=ALL-UNNAMED' '--add-opens=java.base/java.lang.reflect=ALL-UNNAMED' '--add-opens=java.base/java.io=ALL-UNNAMED' '--add-opens=java.base/java.net=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UN
...
...
DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
DEBUG StatusLogger Using configurationFactory org.apache.logging.log4j.core.config.ConfigurationFactory$Factory@6bedbc4d
TRACE StatusLogger Trying to find [log4j2-test18b4aac2.properties] using context class loader sun.misc.Launcher$AppClassLoader@18b4aac2.
TRACE StatusLogger Trying to find [log4j2-test18b4aac2.properties] using sun.misc.Launcher$AppClassLoader@18b4aac2 class loader.

现在有很多不需要的日志被生成,找到我的日志就像大海捞针。有没有一种方法可以只显示我的日志而不显示内部日志?

    • 所以问题依然存在**

1.如何配置log4j2以便我只看到我的记录器
1.任何指针/示例都将有所帮助

    • 编辑2**设置log4j2.debug = false,现在跟踪日志消失。但是,我仍然看到脚本输出
--conf "spark.driver.extraJavaOptions=-Dlog4j.debug=false -Dlog4j2.debug=false
echo "Setting up job resources"
ln -sf -- "/mnt/yarn/usercache/hadoop/filecache/3758/__spark_libs__3245215202131718232.zip" "__spark_libs__"
ln -sf -- "/mnt/yarn/usercache/hadoop/filecache/3760/log4j2.properties" "log4j2.properties"
ln -sf -- "/mnt/yarn/usercache/hadoop/filecache/3759/spark-1.0-SNAPSHOT.jar" "__app__.jar"
ln -sf -- "/mnt/yarn/usercache/hadoop/filecache/3757/__spark_conf__.zip" "__spark_conf__"
ln -sf -- "/mnt/yarn/usercache/hadoop/filecache/3756/hudi-defaults.conf" "hudi-defaults.conf"
echo "Copying debugging information"
# Creating copy of launch script

不知道该怎么做。

rkue9o1l

rkue9o1l1#

最后,在尝试了几个选项后,下面的一个唯一的工作.
1.登录到Spark主机。在我的情况下是EMR主机。打开位于/usr/lib/spark/conf/log4j2.propertieslog4j2.properties--〉
1.备份文件,并更改以反映以下文件。
令人失望的是,在spark 2.2中使用--files log4j.properties标志运行得很好的一些东西在升级spark(--files log4j2.properties)时不起作用,我们必须通过编辑服务器文件上的文件来做一个丑陋的修复。
我的log4j2.properties如下所示

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
rootLogger.level = warn
rootLogger.appenderRef.stdout.ref = STDOUT

appender.console.type = Console
appender.console.name = STDOUT
appender.console.target = SYSTEM_OUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yy-MM-dd HH:mm:ss} %p %c{1}: %m%n%ex

logger.pp.name = com.company
logger.pp.level = debug
logger.pp.additivity = false
logger.pp.appenderRef.console.ref=STDOUT
logger.pp1.name = com.company2
logger.pp1.level = debug
logger.pp1.additivity = false
logger.pp1.appenderRef.console.ref=STDOUT

# Settings to quiet third party logs that are too verbose com.amazonaws.services.s3
logger.jetty.name = org.sparkproject.jetty
logger.jetty.level = warn
logger.jetty2.name = org.sparkproject.jetty.util.component.AbstractLifeCycle
logger.jetty2.level = error
logger.repl1.name = org.apache.spark.repl.SparkIMain$exprTyper
logger.repl1.level = info
logger.repl2.name = org.apache.spark.repl.SparkILoop$SparkILoopInterpreter
logger.repl2.level = info

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
logger.repl.name = org.apache.spark.repl.Main
logger.repl.level = warn

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs
# in SparkSQL with Hive support
logger.metastore.name = org.apache.hadoop.hive.metastore.RetryingHMSHandler
logger.metastore.level = fatal
logger.hive_functionregistry.name = org.apache.hadoop.hive.ql.exec.FunctionRegistry
logger.hive_functionregistry.level = error

# Parquet related logging
logger.parquet.name = org.apache.parquet.CorruptStatistics
logger.parquet.level = error
logger.parquet2.name = parquet.CorruptStatistics
logger.parquet2.level = error

相关问题