我将cdh5.5与sqoop1.4.6和hive1.2.1一起使用(我手动下载了它们以获得额外数据类型的parquet支持)
下面是我使用的import命令:
sqoop import --connect jdbc:oracle:thin:@database:port:name --username username --password password --table SHARE_PROPERTY -m 1 --null-string '\\N' --null-non-string '\\N' --hive-import --hive-table SHARE_PROPERTY --as-parquetfile --compression-codec snappy --map-column-hive "CREATE_DATE=String"
命令成功完成,当我在配置单元中描述表时,我看到:
hive> describe share_property;
OK
share_property_id string
customer_id string
website_user_id string
address_id string
listing_num string
source string
is_facebook string
is_twitter string
is_linkedin string
is_email string
create_date string
Time taken: 1.09 seconds, Fetched: 11 row(s)
这些字段中的大多数实际上是oracle NUMBER
类型和作为导入 double
如果我停止尝试导入一个Parquet文件,只使用默认的文本文件。create\u date字段是oracle中的一个日期,有时作为bigint导入,有时作为字符串导入,具体取决于我使用的命令。
在任何情况下,当我尝试在配置单元中查询此数据时,都会遇到以下错误:
hive> select * from share_property limit 20;
OK
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-format-2.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-pig-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-hadoop-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: parquet.schema.Types$MessageTypeBuilder.addFields([Lparquet/schema/Type;)Lparquet/schema/Types$GroupBuilder;
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:159)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:99)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
很明显,数据没有正确导入,或者parquet有问题。如果我作为文本文件导入,我可以成功地查询数据(只要我设置 --map-column-hive CREATE_DATE=String
,否则它将导入为long),但如果我尝试将该数据插入到parquet表中以转换格式,它也会出错。所以也许错误就在那里?我尝试过用手动设置所有列类型 --map-column-hive
以及 --map-column-java
,但这似乎也无济于事
此外,我还发现有文档说sqoop支持导入oracle DATE
类型,并且该parquet支持日期/时间戳,但我无法将其作为其中一个成功导入(使用 --map-column-hive
或者 --map-column-java
或者 -Doraoop.timestamp.string=false
与 --direct
(可选)
我有种感觉,有些文件我不见了,或者就是找不到。有其他人见过这个吗?
暂无答案!
目前还没有任何答案,快来回答吧!