为presto和aws s3设置独立的配置单元元存储服务

vulvrdjw 于 2021-06-26 发布在 Hive

关注(0)|答案(4)|浏览(443)

我工作的环境中，我有一个s3服务被用作数据湖，但不是aws雅典娜。我正在尝试设置presto以便能够查询s3中的数据，并且我知道我需要通过hivemetastore服务将数据结构定义为hive表。我正在docker中部署每个组件，所以我希望容器的大小尽可能小。我需要配置单元中的哪些组件才能运行metastore服务？我其实并不关心运行Hive，只是元商店。我可以删减所需的内容吗，或者已经有一个预先配置的包了吗？我在网上找不到任何不包括下载所有hadoop和hive的东西。我想做的事可能吗？

Hive presto hive-metastore

来源：https://stackoverflow.com/questions/48932907/setup-standalone-hive-metastore-service-for-presto-and-aws-s3

4条答案

按热度按时间

au9on6nz1#

现在可以单独使用了 /hive-standalone-metastore-3.0.0/ 在apache hive发行版中。
从hive3.0开始，metastore作为一个单独的包发布，可以在没有hive其余部分的情况下运行。这称为独立模式。
默认情况下，metastore配置为与hive一起使用，因此在此配置中必须更改一些配置参数。

metastore.task.threads.always -> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
metastore.expression.proxy -> org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy

链接到文档

赞(0）回复(0）举报 2021-06-26

ckocjqey2#

我能够使用prestosqlamdhms3.0与awss3集成。如果有帮助的话，我写了一篇文章。https://www.linkedin.com/pulse/presto-sql-s3-abhishek-gupta

赞(0）回复(0）举报 2021-06-26

uttx8gqw3#

有一个解决方法，您不需要配置单元来运行presto。不过，我还没有在任何像s3这样的分布式文件系统上尝试过，但代码建议它应该可以工作（至少在hdfs上是这样）。在我看来，这是值得一试的，因为你不需要任何新的码头形象Hive。
其想法是使用内置的filehivemetastore。它既没有记录，也不建议用于生产，但你可以发挥它。模式信息存储在文件系统中的数据旁边。显然，它有其利弊。我不知道你的用例的细节，所以我不知道它是否适合你的需要。
配置：

connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=file:///tmp/hive_catalog
hive.metastore.user=cox

演示：

presto:tiny> create schema hive.default;
CREATE SCHEMA
presto:tiny> use hive.default;
USE
presto:default> create table t (t bigint);
CREATE TABLE
presto:default> show tables;
 Table
-------
 t
(1 row)

Query 20180223_202609_00009_iuchi, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 18B] [11 rows/s, 201B/s]

presto:default> insert into t (values 1);
INSERT: 1 row

Query 20180223_202616_00010_iuchi, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:default> select * from t;
 t
---
 1
(1 row)

完成上述操作后，我在我的机器上找到了以下内容：

/tmp/hive_catalog/
/tmp/hive_catalog/default
/tmp/hive_catalog/default/t
/tmp/hive_catalog/default/t/.prestoPermissions
/tmp/hive_catalog/default/t/.prestoPermissions/user_cox
/tmp/hive_catalog/default/t/.prestoPermissions/.user_cox.crc
/tmp/hive_catalog/default/t/.20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278.crc
/tmp/hive_catalog/default/t/.prestoSchema
/tmp/hive_catalog/default/t/20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278
/tmp/hive_catalog/default/t/..prestoSchema.crc
/tmp/hive_catalog/default/.prestoSchema
/tmp/hive_catalog/default/..prestoSchema.crc

赞(0）回复(0）举报 2021-06-26

wwtsj6pe4#

仅仅为metastore设置hive看起来确实很麻烦。你考虑过用aws胶水数据目录吗？这样你就不用管理任何事情了。您可以在这里找到详细信息：https://docs.aws.amazon.com/emr/latest/releaseguide/emr-presto-glue.html

赞(0）回复(0）举报 2021-06-26