有人能帮助我在sqoop增量模式中append和lastmodified模式的确切区别吗?当使用append模式(使用--check列作为timestamp)可以完成相同的操作时,lastmodified需要做什么?它的工作原理相同,并导入更新的和插入的记录。
ie3xauqp1#
模式:append可用于知道最后一个值的列。mode:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果你知道上次修改的时间戳,你可以直接使用第一种方法。模式:追加
mysql> describe emp; +--------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+-------+ | id | int(11) | YES | | NULL | | | name | varchar(100) | YES | | NULL | | | deg | varchar(100) | YES | | NULL | | | salary | int(11) | YES | | NULL | | | dep | varchar(10) | YES | | NULL | | +--------+--------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
使用sqoop命令导入数据。
sqoop import \ --connect jdbc:mysql://quickstart:3306/retail_db \ --username retail_dba \ --password cloudera \ -m 3 \ --table emp \ --split-by id \ --columns id,name,deg \ --warehouse-dir /user/sqoop/ \ --delete-target-dir \ --as-textfile
hdfs输出
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/ drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS -rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000 -rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001 -rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002 mysql> select * from emp; +------+---------+--------------+--------+------+ | id | name | deg | salary | dep | +------+---------+--------------+--------+------+ | 1201 | gopal | manager | 50000 | tp | | 1202 | manisha | Proof reader | 50000 | TP | | 1203 | php dev | TECH WRITER | 50000 | AC | | 1204 | Nilesh | Domino dev | 70000 | AF | | 1205 | Vinayak | Java dev | 50000 | IT | | 1206 | Amish | Cog dev | 60000 | IT | | 1207 | Jatin | Oracel dev | 40001 | IT | | 1208 | Viren | Java dev | 70004 | IT | | 1209 | Ashish | Oracel dev | 40001 | IT | | 1210 | Satish | Java dev | 70004 | IT | +------+---------+--------------+--------+------+ 10 rows in set (0.00 sec)
在表中插入新记录。
mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT'); Query OK, 1 row affected (0.03 sec) mysql> select * from emp; +------+---------+--------------+--------+------+ | id | name | deg | salary | dep | +------+---------+--------------+--------+------+ | 1201 | gopal | manager | 50000 | tp | | 1202 | manisha | Proof reader | 50000 | TP | | 1203 | php dev | TECH WRITER | 50000 | AC | | 1204 | Nilesh | Domino dev | 70000 | AF | | 1205 | Vinayak | Java dev | 50000 | IT | | 1206 | Amish | Cog dev | 60000 | IT | | 1207 | Jatin | Oracel dev | 40001 | IT | | 1208 | Viren | Java dev | 70004 | IT | | 1209 | Ashish | Oracel dev | 40001 | IT | | 1210 | Satish | Java dev | 70004 | IT | | 1211 | Jag | be | 20000 | IT | +------+---------+--------------+--------+------+ 11 rows in set (0.00 sec)
增量导入命令
sqoop import \ --connect jdbc:mysql://quickstart:3306/retail_db \ --username retail_dba \ --password cloudera \ --table emp \ --split-by id \ --check-column id \ --incremental append \ --last-value 1210 \ --warehouse-dir /user/sqoop/ \ --as-textfile
导入后
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS -rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000 -rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001 -rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002 -rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003 [cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003 1211,Jag,be,20000,IT
模式:上次修改
mysql> describe orders; +-------------------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------------+-------------+------+-----+---------+----------------+ | order_id | int(11) | NO | PRI | NULL | auto_increment | | order_date | datetime | NO | | NULL | | | order_customer_id | int(11) | NO | | NULL | | | order_status | varchar(45) | NO | | NULL | | +-------------------+-------------+------+-----+---------+----------------+ 4 rows in set (0.00 sec)
将订单导入hdfs
sqoop import \ --connect jdbc:mysql://quickstart:3306/retail_db \ --username retail_dba \ --password cloudera \ --table orders \ --split-by order_id \ --target-dir /user/sqoop/orders \ --as-textfile
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS -rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000 -rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001 -rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002 -rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
更新订单数据
mysql> select * from orders where order_id=10; +----------+---------------------+-------------------+-----------------+ | order_id | order_date | order_customer_id | order_status | +----------+---------------------+-------------------+-----------------+ | 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT | +----------+---------------------+-------------------+-----------------+ 1 row in set (0.00 sec) mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from orders where order_id=10; +----------+---------------------+-------------------+--------------+ | order_id | order_date | order_customer_id | order_status | +----------+---------------------+-------------------+--------------+ | 10 | 2017-12-02 16:19:23 | 5648 | CLOSED | +----------+---------------------+-------------------+--------------+ 1 row in set (0.00 sec)
导入其他数据
sqoop import \ --connect jdbc:mysql://quickstart:3306/retail_db \ --username retail_dba \ --password cloudera \ --table orders \ --split-by order_id \ --check-column order_date \ --merge-key order_id \ --incremental lastmodified \ --target-dir /user/sqoop/orders1 \ --as-textfile
输出
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1 -rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS -rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
注意:如果我们使用的目录(订单)与以前的相同,那么它将删除旧文件并创建新零件。
1条答案
按热度按时间ie3xauqp1#
模式:append可用于知道最后一个值的列。
mode:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果你知道上次修改的时间戳,你可以直接使用第一种方法。
模式:追加
使用sqoop命令导入数据。
hdfs输出
在表中插入新记录。
增量导入命令
导入后
模式:上次修改
将订单导入hdfs
导入后
更新订单数据
导入其他数据
输出
注意:如果我们使用的目录(订单)与以前的相同,那么它将删除旧文件并创建新零件。