配置单元中的增量更新

wixjitnu 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(329)

我有一个源mysql表。为了便于分析，我必须将日期导出到hive。最初，当mysql中的数据量较小时，使用sqoop将mysql数据导出到hive并不是一个问题。现在，随着数据量的增长，如何将mysql数据增量更新到hive？

hadoop mysql Hive bigdata

来源：https://stackoverflow.com/questions/36990196/incremental-updates-in-hive

2条答案

按热度按时间

niknxzdl1#

这是一个使用hive/spark进行增量更新的示例。 scala> spark.sql("select * from table1").show +---+---+---------+ | id|sal|timestamp| +---+---+---------+ | 1|100| 30-08| | 2|200| 30-08| | 3|300| 30-08| | 4|400| 30-08| +---+---+---------+ scala> spark.sql("select * from table2").show +---+----+---------+ | id| sal|timestamp| +---+----+---------+ | 2| 300| 31-08| | 4|1000| 31-08| | 5| 500| 31-08| | 6| 600| 31-08| +---+----+---------+ scala> spark.sql("select b.id,b.sal from table1 a full outer join table2 b on a.id = b.id where b.id is not null union select a.id,a.sal from table1 a full outer join table2 b on a.id = b.id where b.id is null").show +---+----+ | id| sal| +---+----+ | 4|1000| | 6| 600| | 2| 300| | 5| 500| | 1| 100| | 3| 300| +---+----+ 希望这个逻辑对你有用。

赞(0）回复(0）举报 2021-05-30

pgky5nke2#

您可以使用sqoop进行增量更新，sqoop文档很好，下面是链接https://sqoop.apache.org/docs/1.4.2/sqoopuserguide.html#_incremental_imports

赞(0）回复(0）举报 2021-05-29