我们可以创建Parquet表从现有的？

vlurs2pr 于 2021-06-24 发布在 Hive

关注(0)|答案(2)|浏览(436)

我有一张有大量数据的table。我必须从中提取一些列并创建一个Parquet表。下面是我做的-

CREATE TABLE parqfiletable
AS
SELECT col2, col4, col8 FROM txtfiletable
STORED AS PARQUET;

这是成功的，但是当我试图从parquet表中检索数据时，它抛出了一个错误。
此外，相同的create table语句现在不起作用。
我会用一种普通的方式来完成它，比如创建一个存储为parquet的表定义，然后从原始源加载数据。但是上面的方法在那个时候立即起作用了，这个表就被创建了。
你能让我明白CREATETABLE语句有什么问题吗？我们不能创建“创建为…”。。作为Parquet地板储存？

Hive parquet

来源：https://stackoverflow.com/questions/65011880/can-we-create-parquet-table-from-existing

2条答案

按热度按时间

btqmn9zl1#

我想出了上述问题的正确语法。 CREATE TABLE parqtable STORED AS PARQUET AS SELECT * FROM textfiletable;

赞(0）回复(0）举报 2021-06-24

ckx4rj1h2#

CREATE TABLE parqtab
AS
SELECT col2, FROM txtfiletable
AS PARQUET;

这样，我们可以创建它，但这不是一个Parquet表根据扩展属性。

Detailed Table Information  Table(tableName:parqtab, dbName:dbtest, owner:cloudera, createTime:1606338549, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col2, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/dbtest.db/parqtab, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1606338550, COLUMN_STATS_ACCURATE=true, totalSize=68, numRows=3, rawDataSize=65}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

观察：

inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false,

我直接创建了一个Parquet地板table。

create table parqtab2(id int) stored as parquet;

详细信息在这里。

Detailed Table Information  Table(tableName:parqtab2, dbName:dbtest, owner:cloudera, createTime:1606338696, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/dbtest.db/parqtab2, inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1606338696}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

从扩展属性观察：

inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

赞(0）回复(0）举报 2021-06-24

我来回答

我们可以创建Parquet表从现有的？

2条答案

相关问题

热门标签

最新问答