如何使用Sqoop将S3中的 parquet 数据导入HDFS？

pgky5nke 于 2022-12-09 发布在 HDFS

关注(0)|答案(2)|浏览(250)

我正在尝试将数据导入RDS中的表。该数据为parquet文件格式，并存在于s3中。我想使用Sqoop将数据从s3导入HDFS，然后使用Sqoop将其导出到RDS表。我找到了将数据从HDFS导出到RDS的命令。但我找不到从S3导入parquet数据的命令。在这种情况下，您能否帮助构建sqoop import命令？

hdfs

来源：https://stackoverflow.com/questions/69107946/how-to-import-parquet-data-from-s3-into-hdfs-using-sqoop

2条答案

按热度按时间

eoigrqb61#

可以使用spark将数据从s3复制到HDFS。
阅读this博客了解更多详细信息。

赞(0）回复(0）举报 2022-12-09

stszievb2#

The approach that seemed simple and best for me is as below:

Create a Parquet table in Hive and load it with the Parquet data from S3

create external table if not exists parquet_table(<column name> <column's datatype>) stored as parquet;

LOAD DATA INPATH 's3a://<bucket_name>/<parquet_file>' INTO table parquet_table

Create a CSV table in Hive and load it with the data from Parquet table

create external table if not exists csv_table(<column name> <column's datatype>)
row format delimited fields terminated by ','
stored as textfile
location 'hdfs:///user/hive/warehouse/csvdata'

Now that we have a CSV/Textfile Table in Hive, Sqoop can easily export the table from HDFS to MySQL table RDS.

export --table <mysql_table_name> --export-dir hdfs:///user/hive/warehouse/csvdata --connect jdbc:mysql://<host>:3306/<db_name> --username <username> --password-file hdfs:///user/test/mysql.password --batch -m 1 --input-null-string "\\N" --input-null-non-string "\\N" --columns <column names to be exported, without whitespace in between the column names>

赞(0）回复(0）举报 2022-12-09

我来回答

如何使用Sqoop将S3中的 parquet 数据导入HDFS？

2条答案

相关问题

热门标签

最新问答