hadoop:查询/读取avro文件

yvgpqqbh  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(380)

我正在存储从复杂的json对象导入到avro格式的数据。
json对象由具有嵌套对象和对象数组的对象表示。avro架构如下所示:

{
    "type" : "record",
    "name" : "userInfo",
    "namespace" : "my.example",
    "fields" : [{"name" : "username", 
                 "type" : "string", 
                 "default" : "NONE"},

                {"name" : "age", 
                 "type" : "int",
                 "default" : -1},

                 {"name" : "phone", 
                  "type" : "string", 
                  "default" : "NONE"},

                 {"name" : "housenum", 
                  "type" : "string", 
                  "default" : "NONE"},

                  {"name" : "address", 
                   "type" : {
                         "type" : "record",
                         "name" : "mailing_address",
                         "fields" : [
                            {"name" : "street", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "city", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "state_prov", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "country", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "zip", 
                             "type" : "string", 
                             "default" : "NONE"}
                          ]},
                          "default" : {}
                }
    ]
}

我使用nifi将json转换为avro,并在hadoop中存储序列化文件(目前我只使用纯hadoop):

我的问题是:
出于测试目的,我想查询存储hdfs(avro格式)的数据。
所以在这一点上我有点困惑,因为围绕hadoop的很多工具和技术。。我怎样才能用正确的方法做这件事?什么工具和工作流程?

wmvff8tz

wmvff8tz1#

您应该能够在写入avro数据的hdfs位置上创建一个外部配置单元表。
这篇文章有几个例子:
https://community.hortonworks.com/questions/22135/is-there-a-way-to-create-hive-table-based-on-avro.html
https://cwiki.apache.org/confluence/display/hive/avroserde

相关问题