将athena输出加载到athena表中

68bkxrlz 于 2021-06-26 发布在 Hive

关注(0)|答案(4)|浏览(350)

我使用athena查询了一个表，得到的输出文件为csv，如下所示：

"col_a_string","col_b_string","col_c_timestamp","col_d_int"

现在，我想把csv文件加载到另一个athena表中，这样我就可以检查我的数据并使用它-但是当我用 FIELDS TERMINATED BY ',' ，值保留括号，所有字段都被视为字符串（timestamp和int列为空列）。
雅典娜看不懂雅典娜的输出有点荒谬。。。如何定义我的表以便它可以忽略括号？
谢谢您！

Hive amazon-web-services amazon-athena

来源：https://stackoverflow.com/questions/49775133/load-athena-output-into-athena-table

4条答案

按热度按时间

mlmc2os51#

试试这个：

CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
  `col_a_string` String,
  `col_b_string` String,
  `col_c_timestamp` TIMESTAMP, 
  `col_d_int ` Int,       
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\""
) LOCATION 's3://your-s3-location'
TBLPROPERTIES ("skip.header.line.count"="1")

这对你有用吗？注意 quotechar 序列化/反序列化属性中的属性。

赞(0）回复(0）举报 2021-06-26

57hvy0tb2#

这就是我通常从csv加载数据的方式。

CREATE EXTERNAL TABLE IF NOT EXISTS my_table(
 `col_a_string` String,
 `col_b_string` String,
 `col_c_timestamp` TIMESTAMP, 
 `col_d_int ` Int)  
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( 
 'separatorChar' = ',', 
 'quoteChar' = '\"', 
 'escapeChar' = '\\' )
STORED AS TEXTFILE 
LOCATION '<s3://filepath>'
TBLPROPERTIES ('has_encrypted_data'='false',
          "skip.header.line.count"="1");

这样做的好处是，即使某些列有双引号，它也能正确地解析和加载
例如col\u a\u string，“col\u b\u string”，col\u c\u timestamp，“col\u d\u int”

赞(0）回复(0）举报 2021-06-26

wpx232ag3#

试试这个，对我有帮助

CREATE EXTERNAL TABLE `my_table`(
  col_a_string` String,
 `col_b_string` String,
 `col_c_timestamp` TIMESTAMP, 
 `col_d_int ` Int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'escapeChar'='\\', 
  'separatorChar'=',') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<path>'
TBLPROPERTIES (
  'skip.header.line.count'='1')

赞(0）回复(0）举报 2021-06-26

dauxcl2d4#

现在看来这是一项功能：https://docs.aws.amazon.com/athena/latest/ug/ctas.html
createtableasselect（ctas）查询从另一个查询的select语句的结果在athena中创建一个新表。athena将ctas语句创建的数据文件存储在amazons3的指定位置。有关语法，请参见将表创建为。
使用ctas查询：
只需一步就可以从查询结果创建表，而无需重复查询原始数据集。这使得使用原始数据集更容易。
将查询结果转换为其他存储格式，如parquet和orc。这在athena中提高了查询性能并降低了查询成本。有关信息，请参见列存储格式。
创建只包含所需数据的现有表的副本。

赞(0）回复(0）举报 2021-06-26

我来回答

将athena输出加载到athena表中

4条答案

相关问题

热门标签

最新问答