无法从配置单元查询记录,当数据以avro格式存储时,返回“error\u error…”异常

0aydgbwb  于 2021-06-03  发布在  Sqoop
关注(0)|答案(1)|浏览(303)

我们遵循以下步骤,
将表从mysql导入到hdfs位置 user/hive/warehouse/orders/ ,表架构为

mysql> describe orders;
+-------------------+-------------+------+-----+---------+-------+
| Field             | Type        | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| order_id          | int(11)     | YES  |     | NULL    |       |
| order_date        | varchar(30) | YES  |     | NULL    |       |
| order_customer_id | int(11)     | YES  |     | NULL    |       |
| order_items       | varchar(30) | YES  |     | NULL    |       |
+-------------------+-------------+------+-----+---------+-------+

使用(1)中的相同数据在配置单元中创建了外部表。

CREATE EXTERNAL TABLE orders
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');

sqoop命令:

sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=root \
  --password=cloudera \
  --table orders \
  --target-dir /user/hive/warehouse/retail_stage.db/orders \
  --as-avrodatafile \
  --split-by order_id

描述格式化订单,返回错误,尝试多次组合但失败。

hive> describe orders;
OK
error_error_error_error_error_error_error   string                  from deserializer   
cannot_determine_schema string                  from deserializer   
check                   string                  from deserializer   
schema                  string                  from deserializer   
url                     string                  from deserializer   
and                     string                  from deserializer   
literal                 string                  from deserializer   
Time taken: 1.15 seconds, Fetched: 7 row(s)

同样的东西对我有用 --as-textfile ,其中在 --as-avrodatafile .
引用了一些堆栈溢出,但无法解决。你知道吗?

ldxq2e6h

ldxq2e6h1#

我认为应该检查tblproperty中对avro模式文件的引用。
以下问题解决了吗?
hdfs dfs-目录hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc
我能够创建精确的场景并从配置单元表中进行选择。

hive> CREATE EXTERNAL TABLE sqoop_test
    > COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'    
    > STORED AS 
    >    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'    
>    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'    
> TBLPROPERTIES 
>  ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ;

正常时间:1.471秒

hive> select * from sqoop_test;
 OK
 1  2   Football
 2  2   Soccer
 3  2   Baseball & Softball

相关问题