apache spark 2.3.1和hive metastore 3.1.0

to94eoyn 于 2021-06-27 发布在 Hive

关注(0)|答案(2)|浏览(557)

我们已将hdp cluster升级到3.1.1.3.0.1.0-187，并发现：
hive有一个新的元存储位置
spark看不到Hive数据库
事实上我们看到：

org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found

你能帮助我了解发生了什么事以及如何解决这个问题吗？
更新：
配置：
（spark.sql.warehouse.dir，/warehouse/tablespace/external/hive/）（spark.admin.acls，）（spark.yarn.dist.files，文件：///opt/folder/config.yml，文件：///opt/jdk1.8.0172/jre/lib/security/cacerts）（spark.history.kerberos.keytab，/etc/security/keytab/spark.service.keytab）（spark.io.compression.lz4.blocksize，128kb）（spark.executor.extrajavaopt，-djavax.net.ssl.truststore=cacerts）（spark.history.fs.logdirectory，hdfs:///spark2 history/）（spark.io.encryption.keygen.algorithm，hmacsha1）（spark.sql.autobroadcastjointhreshold，26214400）（spark.eventlog.enabled，true）（spark.shuffle.service.enabled，true）（spark.driver.extralibrarypath，/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64-64）（spark.ssl.keystore，/etc/security/serverkeys/server keystore.jks）（spark.warn.queue，默认）（spark.jars，文件：/opt/folder/component-assembly-0.1.0-snapshot.jar）（spark.ssl.enabled，true）（spark.sql.orc.filterpushdown，true）（spark.shuffle.unsafe.file.output.buffer，5m）（spark.yarn.historyserver.address，master2.env。project:18481)（spark.ssl.truststore，/etc/security/clientkeys/all.jks）（spark.app.name，com.company.env.component.myclass）（spark.sql.hive.metastore.jars，/usr/hdp/current/spark2 client/standalone metastore/）（spark.io.encryption.keysizebits，128）（spark.driver.memory，2g）（spark.executor.instances，10）（spark.history.kerberos.principal，spark/edge.env。project@env.project)（spark.ssl.keypassword，（redacted））（spark.ssl.keypassword，（redacted））（spark.ssl.keystrepassword，******（redacted））（spark.history.fs.cleaner.enabled，true）（spark.shuffle.io.serverthreads，128）（spark.sql.hive.convertmatastoreorc，true）（spark.submit.deploymode，client）（spark.sql.orc.char.enabled，true）（spark.master，yarn）（spark.authenticate.enablesalencryption，true）（spark.history.fs.cleaner.interval，7d）（spark.authenticate，true）（spark.history.fs.cleaner.maxage，90d）（spark.history.ui.acls.enable，true）（spark.acls.enable，true）（spark.history.provider，org.apache.spark.deploy.history.fshistoryprovider）（spark.executor.extralibrarypath，/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64）（spark.executor.memory，2g）（spark.io.encryption.enabled，true）（spark.shuffle.file.buffer，1m）（spark.eventlog.dir，hdfs:///spark2 history/）（spark.ssl.protocol，tls）（spark.dynamicalocation.enabled，true）（spark.executor.cores，3）（spark.history.ui.port，18081）（spark.sql.statistics.fallbacktohdfs，true）（spark.repl.local.jars，文件：///opt/folder/postgresql-42.2.2.jar，文件：///opt/folder/ojdbc6.jar）（spark.ssl.truststorepassword，*********（修订版））（spark.history.ui.admin.acls，）（spark.history.kerberos.enabled，true）（spark.shuffle.io.backlog，8192）（spark.sql.orc.impl，native）（spark.ssl.enabledalgorithms，tls\u rsa\u with \u aes\u 128\u cbc\u sha，tls\u rsa\u with \u aes\u 256\u cbc\u sha）（spark.sql.orc.enabled，true）（spark.yarn.dist.jars，文件：///opt/folder/postgresql-42.2.2.jar，文件：///opt/folder/ojdbc6.jar）（spark.sql.hive.metastore.version，3.0）
从hive-site.xml：

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/warehouse/tablespace/managed/hive</value>
</property>

代码如下所示：

val spark = SparkSession
  .builder()
  .appName(getClass.getSimpleName)
  .enableHiveSupport()
  .getOrCreate()
...
dataFrame.write
  .format("orc")
  .options(Map("spark.sql.hive.convertMetastoreOrc" -> true.toString))
  .mode(SaveMode.Append)
  .saveAsTable("name")

spark提交：

--master yarn \
    --deploy-mode client \
    --driver-memory 2g \
    --driver-cores 4 \
    --executor-memory 2g \
    --num-executors 10 \
    --executor-cores 3 \
    --conf "spark.dynamicAllocation.enabled=true" \
    --conf "spark.shuffle.service.enabled=true" \
    --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=cacerts" \
    --conf "spark.sql.warehouse.dir=/warehouse/tablespace/external/hive/" \
    --jars postgresql-42.2.2.jar,ojdbc6.jar \
    --files config.yml,/opt/jdk1.8.0_172/jre/lib/security/cacerts \
    --verbose \
    component-assembly-0.1.0-SNAPSHOT.jar \

Hive apache-spark apache-spark-sql hdp hive-metastore

来源：https://stackoverflow.com/questions/53010746/apache-spark-2-3-1-with-hive-metastore-3-1-0

2条答案

按热度按时间

tez616oj1#

我有一个回溯技巧为这一个虽然免责声明，它绕过了游侠权限（不要责怪我，如果你招致一个管理员的愤怒）。
与Spark壳配合使用

export HIVE_CONF_DIR=/usr/hdp/current/hive-client/conf
spark-shell --conf "spark.driver.extraClassPath=/usr/hdp/current/hive-client/conf"

与sparkyr一起使用

Sys.setenv(HIVE_CONF_DIR="/usr/hdp/current/hive-client/conf")
conf = spark_config()
conf$'sparklyr.shell.driver-class-path' = '/usr/hdp/current/hive-client/conf'

它应该工作的节俭服务器，但我还没有测试。

赞(0）回复(0）举报 2021-06-27

0x6upsns2#

看起来这是一个没有实现的spark特性。但我发现，从3.0开始使用spark和hive的唯一方法是使用horton的hivewarehouseconnector。这里是文档。霍顿社区的好向导。在spark开发人员准备好自己的解决方案之前，我不回答这个问题。

赞(0）回复(0）举报 2021-06-27