从avsc创建配置单元表,其中包含对以前定义的模式的引用作为类型

2uluyalo  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(418)

我希望通过hive找到一种方法来获取以下avsc文件内容,并将嵌套模式“rentalrecordtype”外部化,以实现模式重用。

{
    "type": "record",
    "name": "EMPLOYEE",
    "namespace": "",
    "doc": "EMPLOYEE is a person that works here",
    "fields": [
        {
            "name": "RENTALRECORD",
            "type": {
                "type": "record",
                "name": "RENTALRECORDTYPE",
                "namespace": "",
                "doc": "Rental record is a record that is kept on every item rented",
                "fields": [
                    {
                        "name": "due_date",
                        "doc": "The date when item is due",
                        "type": "int"
                    } 
                ]
            }
        },
        {
            "name": "hire_date",
            "doc": "Employee date of hire",
            "type": "int"
        }
    ]
}

这种定义模式的方法工作得很好。我能够发出以下hiveql语句,并且成功地创建了表。

CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

但是,我希望能够引用现有的模式,而不是在多个模式中复制记录定义。例如,将生成两个avsc文件,而不是单个模式文件。i、 e.rentalrecord.avsc和employee.avsc。
租赁记录.avsc

{
    "type": "record",
    "name": "RENTALRECORD",
    "namespace": "",
    "doc": "A record that is kept for every rental",
    "fields": [
        {
            "name": "due_date",
            "doc": "The date on which the rental is due back to the store",
            "type": "int"
        }
    ]
}

员工.avsc

{
    "type": "record",
    "name": "EMPLOYEE",
    "namespace": "",
    "doc": "EMPLOYEE is a person that works for the VIDEO STORE",
    "fields": [
        {
            "name": "rentalrecord",
            "doc": "A rental record is a record on every rental",
            "type": "RENTALRECORD"
        },
        {
            "name": "hire_date",
            "doc": "Employee date of hire",
            "type": "int"
        }
    ]
}

在上面的场景中,我们希望能够将rentalrecord模式定义外部化,并能够在employee.avsc和其他地方重用它。
尝试使用以下两个hiveql语句导入架构时,失败…

CREATE EXTERNAL TABLE rentalrecord
STORED AS AVRO
LOCATION '/user/dtom/store/data/rentalrecord'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema /rentalrecord.avsc');

CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

rentalrecord.avsc已成功导入,但employee.avsc在第一个字段定义中失败。“rentalrecord”类型的字段。以下错误由配置单元输出…
失败:执行错误,从org.apache.hadoop.hive.ql.exec.ddltask返回代码1。java.lang.runtimeexception:元异常(message:org.apache.hadoop.hive.serde2.serdeexception encountered 确定架构时发生异常。返回信号模式以指示问题:“rentalrecord”不是已定义的名称。“rentalrecord”字段的类型必须是已定义的名称或{“type”:…}表达式。)
我的研究告诉我avro文件确实支持这种形式的模式恢复。所以要么我遗漏了什么,要么这是一个不支持通过Hive。
任何帮助都将不胜感激。

c7rzv4ha

c7rzv4ha1#

我定义了一个包含所有引用的avdl,然后使用avrotoolsjar文件和idl2schemata选项来生成avsc。生成的avsc就像一个魅力与Hive!!

相关问题