The approach that seemed simple and best for me is as below:
Create a Parquet table in Hive and load it with the Parquet data from S3
create external table if not exists parquet_table(<column name> <column's datatype>) stored as parquet;
LOAD DATA INPATH 's3a://<bucket_name>/<parquet_file>' INTO table parquet_table
Create a CSV table in Hive and load it with the data from Parquet table
create external table if not exists csv_table(<column name> <column's datatype>)
row format delimited fields terminated by ','
stored as textfile
location 'hdfs:///user/hive/warehouse/csvdata'
Now that we have a CSV/Textfile Table in Hive, Sqoop can easily export the table from HDFS to MySQL table RDS.
export --table <mysql_table_name> --export-dir hdfs:///user/hive/warehouse/csvdata --connect jdbc:mysql://<host>:3306/<db_name> --username <username> --password-file hdfs:///user/test/mysql.password --batch -m 1 --input-null-string "\\N" --input-null-non-string "\\N" --columns <column names to be exported, without whitespace in between the column names>
2条答案
按热度按时间eoigrqb61#
可以使用spark将数据从s3复制到HDFS。
阅读this博客了解更多详细信息。
stszievb2#
The approach that seemed simple and best for me is as below: