访问在workers(databricks)中运行的udf中的spark配置单元元存储

dgenwo3n 于 2021-07-14 发布在 Spark

关注(0)|答案(0)|浏览(247)

上下文
我有一个应该使用pyspark在一些表上执行的操作。此操作包括访问spark metastore（在databricks中）以获取一些元数据。因为我有很多表，所以我正在使用rdd在集群工作进程中并行执行此操作，如下面的代码所示：

base_spark_context = SparkContext.getOrCreate()
    rdd = base_spark_context.sc.parallelize(tables_list)
    rdd.map(lambda table_name: sync_table(table_name)).collect()

自由民主党 sync_table() 在元存储上运行查询，类似于以下代码行：

spark_client.session.sql("select 1")

问题是这个sql执行不成功。相反，我得到了一些元存储相关的错误。回溯：

py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

(suppressed lines)

Caused by: java.lang.reflect.InvocationTargetException

(suppressed lines)

Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@16c0663d, see the next exception for details.

(suppressed lines)

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /databricks/spark/work/app-20210413201900-0000/0/metastore_db.

在以这种方式并行化操作之后，访问worker中的databricks元存储是否有任何限制？或者有可能进行这样的手术？

Hive apache-spark pyspark databricks metastore

来源：https://stackoverflow.com/questions/67129879/access-spark-hive-metastore-within-an-udf-running-in-the-workers-databricks

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

访问在workers(databricks)中运行的udf中的spark配置单元元存储

暂无答案！

相关问题

热门标签

最新问答