当我试图从jupyter运行一个连接到kafka的spark作业时遇到了问题,因为没有找到jaas.conf。但是,如果我从spark submit运行作业,就可以正常工作了。
我的json配置emr群集:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.jars.packages": "org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5",
"spark.executor.extraJavaOptions": "-Djava.security.auth.login.config=jaas.conf",
"spark.driver.extraJavaOptions": "-Djava.security.auth.login.config=jaas.conf",
"spark.files": "s3://aws-emr-resources-id-us-west-2/jaas.conf"
}
}
]
在日志中,我看到文件已成功上载到应用程序目录:
20/07/07 17:03:43 INFO LineBufferedStream: 20/07/07 17:03:43 INFO Client: Uploading resource s3://aws-emr-resources-id-us-west-2/jaas.conf -> hdfs://ip-172-31-5-254.us-west-2.compute.internal:8020/user/livy/.sparkStaging/application_1594140910935_0002/jaas.conf
20/07/07 17:03:43 INFO LineBufferedStream: 20/07/07 17:03:43 INFO S3NativeFileSystem: Opening 's3://aws-emr-resources-id-us-west-2/jaas.conf' for reading
但是,在下面我看到,作业找不到jaas.conf:
20/07/07 17:04:03 INFO LineBufferedStream: Caused by: java.io.IOException: jaas.conf (No such file or directory)
20/07/07 17:04:03 INFO LineBufferedStream: at sun.security.provider.ConfigFile$Spi.ioException(ConfigFile.java:666)
20/07/07 17:04:03 INFO LineBufferedStream: at sun.security.provider.ConfigFile$Spi.init(ConfigFile.java:262)
20/07/07 17:04:03 INFO LineBufferedStream: at sun.security.provider.ConfigFile$Spi.<init>(ConfigFile.java:135)
我建议这是由于不同的应用程序环境造成的,因为如果我以以下方式运行作业:
spark-submit --master yarn --deploy-mode cluster test_emr.py
我看到另一个部署目录,作业成功地继续工作:
20/07/08 05:32:52 INFO Client: Uploading resource s3://aws-emr-resources-id-us-west-2/jaas.conf -> hdfs://ip-172-31-11-181.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1594185461290_0001/jaas.conf
20/07/08 05:32:52 INFO S3NativeFileSystem: Opening 's3://aws-emr-resources-id-us-west-2/jaas.conf' for reading
但是,spark配置页面说,文件放在每个执行器的工作目录中。所以我不明白为什么作业没有看到jaas.conf。
有谁能告诉我解决问题的真正方法吗?
暂无答案!
目前还没有任何答案,快来回答吧!