我通过给adls gen1位置“adl://[path to adls location]”创建了一个托管表,例如hive中的表a。
表a是一个分区表,记录作为parquets文件存储在adls gen 1位置。
我正在尝试从另一个配置单元表b向该表插入数据。表b表具有高达32 gb的大量数据。
我使用下面的配置来允许数据插入到表a中,表a是一个分区表,使用:
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
我尝试的查询具有如下示例语法:
INSERT INTO TABLE A Partition(id)
select
a,
b,
c
from b where id >=10
distribute by id;
上面的查询在插入数据时出现以下错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: com.microsoft.azure.datalake.store.ADLException: Error appending to file /***/_task_tmp.-ext-10000/***/_tmp.000010_3
Operation APPEND failed with exception java.io.IOException : Error writing to server
Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null]
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:751)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: com.microsoft.azure.datalake.store.ADLException: Error appending to file /***/_task_tmp.-ext-10000/***/_tmp.000010_3
Operation APPEND failed with exception java.io.IOException : Error writing to server
Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null]
at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1176)
at com.microsoft.azure.datalake.store.ADLFileOutputStream.flush(ADLFileOutputStream.java:180)
at com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:119)
at org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:63)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at parquet.bytes.BytesInput$ByteArrayBytesInput.writeAllTo(BytesInput.java:355)
at parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:320)
at parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
at parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:160)
at parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:136)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:118)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:136)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:149)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:717)
... 10 more
我需要解决这个问题,因为我需要存储在adls gen1位置的数据。上面的查询对于多达100万条记录的数据工作正常,但当数据大小增加时会失败。
我厌倦了增加减速机内存,进程加快了,但最终失败了。
我还注意到,如果我们将表a的位置改为hdfs而不是adls,那么查询对于~32gb数据和更多数据都可以正常工作。
暂无答案!
目前还没有任何答案,快来回答吧!