apache spark 2.3.1和hive metastore 3.1.0

to94eoyn  于 2021-06-27  发布在  Hive
关注(0)|答案(2)|浏览(557)

我们已将hdp cluster升级到3.1.1.3.0.1.0-187,并发现:
hive有一个新的元存储位置
spark看不到Hive数据库
事实上我们看到:

org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found

你能帮助我了解发生了什么事以及如何解决这个问题吗?
更新:
配置:
(spark.sql.warehouse.dir,/warehouse/tablespace/external/hive/)(spark.admin.acls,)(spark.yarn.dist.files,文件:///opt/folder/config.yml,文件:///opt/jdk1.8.0172/jre/lib/security/cacerts)(spark.history.kerberos.keytab,/etc/security/keytab/spark.service.keytab)(spark.io.compression.lz4.blocksize,128kb)(spark.executor.extrajavaopt,-djavax.net.ssl.truststore=cacerts)(spark.history.fs.logdirectory,hdfs:///spark2 history/)(spark.io.encryption.keygen.algorithm,hmacsha1)(spark.sql.autobroadcastjointhreshold,26214400)(spark.eventlog.enabled,true)(spark.shuffle.service.enabled,true)(spark.driver.extralibrarypath,/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64-64)(spark.ssl.keystore,/etc/security/serverkeys/server keystore.jks)(spark.warn.queue,默认)(spark.jars,文件:/opt/folder/component-assembly-0.1.0-snapshot.jar)(spark.ssl.enabled,true)(spark.sql.orc.filterpushdown,true)(spark.shuffle.unsafe.file.output.buffer,5m)(spark.yarn.historyserver.address,master2.env。project:18481)(spark.ssl.truststore,/etc/security/clientkeys/all.jks)(spark.app.name,com.company.env.component.myclass)(spark.sql.hive.metastore.jars,/usr/hdp/current/spark2 client/standalone metastore/)(spark.io.encryption.keysizebits,128)(spark.driver.memory,2g)(spark.executor.instances,10)(spark.history.kerberos.principal,spark/edge.env。project@env.project)(spark.ssl.keypassword,(redacted))(spark.ssl.keypassword,(redacted))(spark.ssl.keystrepassword,******(redacted))(spark.history.fs.cleaner.enabled,true)(spark.shuffle.io.serverthreads,128)(spark.sql.hive.convertmatastoreorc,true)(spark.submit.deploymode,client)(spark.sql.orc.char.enabled,true)(spark.master,yarn)(spark.authenticate.enablesalencryption,true)(spark.history.fs.cleaner.interval,7d)(spark.authenticate,true)(spark.history.fs.cleaner.maxage,90d)(spark.history.ui.acls.enable,true)(spark.acls.enable,true)(spark.history.provider,org.apache.spark.deploy.history.fshistoryprovider)(spark.executor.extralibrarypath,/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64)(spark.executor.memory,2g)(spark.io.encryption.enabled,true)(spark.shuffle.file.buffer,1m)(spark.eventlog.dir,hdfs:///spark2 history/)(spark.ssl.protocol,tls)(spark.dynamicalocation.enabled,true)(spark.executor.cores,3)(spark.history.ui.port,18081)(spark.sql.statistics.fallbacktohdfs,true)(spark.repl.local.jars,文件:///opt/folder/postgresql-42.2.2.jar,文件:///opt/folder/ojdbc6.jar)(spark.ssl.truststorepassword,*********(修订版))(spark.history.ui.admin.acls,)(spark.history.kerberos.enabled,true)(spark.shuffle.io.backlog,8192)(spark.sql.orc.impl,native)(spark.ssl.enabledalgorithms,tls\u rsa\u with \u aes\u 128\u cbc\u sha,tls\u rsa\u with \u aes\u 256\u cbc\u sha)(spark.sql.orc.enabled,true)(spark.yarn.dist.jars,文件:///opt/folder/postgresql-42.2.2.jar,文件:///opt/folder/ojdbc6.jar)(spark.sql.hive.metastore.version,3.0)
从hive-site.xml:

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/warehouse/tablespace/managed/hive</value>
</property>

代码如下所示:

val spark = SparkSession
  .builder()
  .appName(getClass.getSimpleName)
  .enableHiveSupport()
  .getOrCreate()
...
dataFrame.write
  .format("orc")
  .options(Map("spark.sql.hive.convertMetastoreOrc" -> true.toString))
  .mode(SaveMode.Append)
  .saveAsTable("name")

spark提交:

--master yarn \
    --deploy-mode client \
    --driver-memory 2g \
    --driver-cores 4 \
    --executor-memory 2g \
    --num-executors 10 \
    --executor-cores 3 \
    --conf "spark.dynamicAllocation.enabled=true" \
    --conf "spark.shuffle.service.enabled=true" \
    --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=cacerts" \
    --conf "spark.sql.warehouse.dir=/warehouse/tablespace/external/hive/" \
    --jars postgresql-42.2.2.jar,ojdbc6.jar \
    --files config.yml,/opt/jdk1.8.0_172/jre/lib/security/cacerts \
    --verbose \
    component-assembly-0.1.0-SNAPSHOT.jar \
tez616oj

tez616oj1#

我有一个回溯技巧为这一个虽然免责声明,它绕过了游侠权限(不要责怪我,如果你招致一个管理员的愤怒)。
与Spark壳配合使用

export HIVE_CONF_DIR=/usr/hdp/current/hive-client/conf
spark-shell --conf "spark.driver.extraClassPath=/usr/hdp/current/hive-client/conf"

与sparkyr一起使用

Sys.setenv(HIVE_CONF_DIR="/usr/hdp/current/hive-client/conf")
conf = spark_config()
conf$'sparklyr.shell.driver-class-path' = '/usr/hdp/current/hive-client/conf'

它应该工作的节俭服务器,但我还没有测试。

0x6upsns

0x6upsns2#

看起来这是一个没有实现的spark特性。但我发现,从3.0开始使用spark和hive的唯一方法是使用horton的hivewarehouseconnector。这里是文档。霍顿社区的好向导。在spark开发人员准备好自己的解决方案之前,我不回答这个问题。

相关问题