sqoop导入--因为来自oracle的parquetfile获取了错误的文件类型

j9per5c4  于 2021-06-03  发布在  Sqoop
关注(0)|答案(0)|浏览(223)

我将cdh5.5与sqoop1.4.6和hive1.2.1一起使用(我手动下载了它们以获得额外数据类型的parquet支持)
下面是我使用的import命令:

sqoop import --connect jdbc:oracle:thin:@database:port:name --username username --password password --table SHARE_PROPERTY -m 1 --null-string '\\N' --null-non-string '\\N'  --hive-import --hive-table SHARE_PROPERTY --as-parquetfile --compression-codec snappy --map-column-hive "CREATE_DATE=String"

命令成功完成,当我在配置单元中描述表时,我看到:

hive> describe share_property;
OK
share_property_id       string                                      
customer_id             string                                      
website_user_id         string                                      
address_id              string                                      
listing_num             string                                      
source                  string                                      
is_facebook             string                                      
is_twitter              string                                      
is_linkedin             string                                      
is_email                string                                      
create_date             string                                      
Time taken: 1.09 seconds, Fetched: 11 row(s)

这些字段中的大多数实际上是oracle NUMBER 类型和作为导入 double 如果我停止尝试导入一个Parquet文件,只使用默认的文本文件。create\u date字段是oracle中的一个日期,有时作为bigint导入,有时作为字符串导入,具体取决于我使用的命令。
在任何情况下,当我尝试在配置单元中查询此数据时,都会遇到以下错误:

hive> select * from share_property limit 20;
OK
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-format-2.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-pig-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-hadoop-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: parquet.schema.Types$MessageTypeBuilder.addFields([Lparquet/schema/Type;)Lparquet/schema/Types$GroupBuilder;
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:159)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:99)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

很明显,数据没有正确导入,或者parquet有问题。如果我作为文本文件导入,我可以成功地查询数据(只要我设置 --map-column-hive CREATE_DATE=String ,否则它将导入为long),但如果我尝试将该数据插入到parquet表中以转换格式,它也会出错。所以也许错误就在那里?我尝试过用手动设置所有列类型 --map-column-hive 以及 --map-column-java ,但这似乎也无济于事
此外,我还发现有文档说sqoop支持导入oracle DATE 类型,并且该parquet支持日期/时间戳,但我无法将其作为其中一个成功导入(使用 --map-column-hive 或者 --map-column-java 或者 -Doraoop.timestamp.string=false--direct (可选)
我有种感觉,有些文件我不见了,或者就是找不到。有其他人见过这个吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题