无法使用spark从hivecontext获取现有的配置单元表

kmb7vmvb  于 2021-06-02  发布在  Hadoop
关注(0)|答案(3)|浏览(340)

我正在尝试使用hivecontext从spark获取配置单元的数据库或表详细信息。但我无法指向现有的配置单元数据库,如下所示:spark版本:2.2.0配置单元版本:2.3.0
在spark shell中使用下面的脚本连接到现有的配置单元服务器(下面使用的127.0.0.1是我的机器ip地址):

scala> val hc = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6dde913e

scala> hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")

scala> val df = hc.sql("show databases")
df: org.apache.spark.sql.DataFrame = [databaseName: string]

scala> df.show
+------------+
|databaseName|
+------------+
|     default|
+------------+

scala> val dfTables = hc.sql("show tables");
dfTables: org.apache.spark.sql.DataFrame = [database: string, tableName: string ... 1 more field]

scala> dfTables.show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

如上所示,我无法获得现有的配置单元数据库和表。hivecontext指向新数据库(默认),但没有可用的表。下面列出了我的配置单元数据库:

hive> show databases;
OK
default
mydbbackup
Time taken: 7.593 seconds, Fetched: 2 row(s)
hive> use mydbbackup;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
customers
customerspart
customerspart1
Time taken: 0.194 seconds, Fetched: 3 row(s)
hive>

下面是my hive-site.xml:

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hduser/apache-hive-2.3.0-bin/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>

下面是我的spark conf目录:

total 40
drwxr-xr-x  2 root root 4096 Nov 12 20:22 ./
drwxr-xr-x 12 root root 4096 Nov  9 22:57 ../
-rw-r--r--  1 root root  996 Nov  9 22:57 docker.properties.template
-rw-r--r--  1 root root 1105 Nov  9 22:57 fairscheduler.xml.template
-rw-r--r--  1 root root 2025 Nov  9 22:57 log4j.properties.template
-rw-r--r--  1 root root 7313 Nov  9 22:57 metrics.properties.template
-rw-r--r--  1 root root  865 Nov  9 22:57 slaves.template
-rw-r--r--  1 root root 1292 Nov  9 22:57 spark-defaults.conf.template
-rwxr-xr-x  1 root root 3699 Nov  9 22:57 spark-env.sh.template*

我是否需要修改任何东西来指向现有的配置单元服务器,而不是创建新的配置单元服务器。请在同样的问题上帮助我。

iqjalb3h

iqjalb3h1#

这将为您提供所需的结果:
导入org.apache.spark.sql.hive.hivecontext
val hc=新hivecontext(sc)
导入hc.IMPLITS_
val df=hc.sql(“显示数据库”)
测向显示

wqsoz72f

wqsoz72f2#

在hive-site.xml中使用属性:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://<<hostname>>:<<port>>/hive?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>username</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

然后,在spark安装的conf文件夹中放置hive-site.xml,然后重试

jtoj6r0c

jtoj6r0c3#

启动spark shell,如下所示:

./spark-shell --driver-java-options 
"-Dhive.metastore.uris=thrift://localhost:9083"

相关问题