从驱动程序节点找到当前spark作业的应用程序id?

kgsdhlau  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(436)

有没有一种简单的方法可以从amazon的ElasticMapReduce(emr)下运行的驱动程序节点获取当前作业的应用程序ID?这是在群集模式下运行spark。
现在我正在使用运行 map() 对辅助进程执行的读取 CONTAINER_ID 环境变量。这似乎效率低下。代码如下:

def applicationIdFromEnvironment():
    return "_".join(['application'] + os.environ['CONTAINER_ID'].split("_")[1:3])

def applicationId():
    """Return the Yarn (or local) applicationID.
    The environment variables are only set if we are running in a Yarn container.
    """

    # First check to see if we are running on the worker...
    try:
        return applicationIdFromEnvironment()
    except KeyError:
        pass

    # Perhaps we are running on the driver? If so, run a Spark job that finds it.
    try:
        from pyspark import SparkConf, SparkContext
        sc = SparkContext.getOrCreate()
        if "local" in sc.getConf().get("spark.master"):
            return f"local{os.getpid()}"
        # Note: make sure that the following map does not require access to any existing module.
        appid = sc.parallelize([1]).map(lambda x: "_".join(['application'] + os.environ['CONTAINER_ID'].split("_")[1:3])).collect()
        return appid[0]
    except ImportError:
        pass

    # Application ID cannot be determined.
    return f"unknown{os.getpid()}"
gopyfrb3

gopyfrb31#

您可以使用属性直接从sparkcontext获取applicationid applicationId :
spark应用程序的唯一标识符。其格式取决于调度器实现。
如果是本地spark应用程序,比如“local-1433865536131”
类似“应用程序”的Yarn箱\u 1433865536131 \u 34483

appid = sc.applicationId

相关问题