1 开启Hadoop集群

sbin/start-dfs.sh
sbin/start-yarn.sh
bin/mapred --daemon start historyserver

2 创建数据源和目标数据——PostgreSQL关系型数据库

PostgreSQL 创建表并插入数据

CREATE TABLE "public"."top_n_url" (
  "url" varchar(255) NOT NULL,
  "star_number" int4,
  "comment_number" int4,
  PRIMARY KEY ("url")
)
;

INSERT INTO top_n_url(url, star_number, comment_number)
      VALUES ('https://github.com/lycheeman/big-data', 1299, 789);
INSERT INTO top_n_url(url, star_number, comment_number)
      VALUES ('https://busuanzi.org', 879, 789);
INSERT INTO top_n_url(url, star_number, comment_number)
      VALUES ('https://blog.csdn.net/dream_an/article/details/80854827', 888, 652);

HDFS 创建数据目标路径

hadoop fs -mkdir -p /sqoop2/jdbc2hdfs1

3 设置Sqoop2 server

sqoop2-server start
sqoop2-shell

set server --host 192.168.56.101 --port 12000 --webapp sqoop
show version --all

show connector

4 创建JDBC连接

sqoop:000> create link -connector generic-jdbc-connector

5 创建HDFS连接

sqoop:000> create link -connector hdfs-connector

show link

6 创建Job任务

sqoop:000> create job -f "busuanzi-jdbc-link1" -t "busuanzi-hdfs-link1"

7 执行job

start job -name "jdbc2hdfs1"

http://192.168.56.101:19888/jobhistory/

本节完成

长按关注从入门到精通

开源、源码获取 https://github.com/lycheeman/big-data

（下划线以下部分可忽略）

拓展部分

Sqoop官方命令

申请的MapReduce内存大于集群提供的内存，所以任务被kill掉，执行失败。

sqoop2 job failed with status 3 或者 is running 314272256B beyond the 'VIRTUAL' memory limit. Current usage: 140.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.

解决：在创建job时，设定map、reduce的数量小一些，比如：

Extractors:1
Loaders:0

表示，map数量为1，reduce数量为0

执行文件写入权限认证问题

Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_HDFS_CONNECTOR_0007:Invalid input/output directory - Unexpected exception

解决

添加core-site.xml如下配置，并重启集群生效。

 <property>
   <name>hadoop.proxyuser.busuanzi.hosts</name>
   <value>*</value>
 </property>
 <property>
   <name>hadoop.proxyuser.busuanzi.groups</name>
   <value>*</value>
 </property>

创建JDBC连接明细

sqoop:000> create link -connector generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: busuanzi-jdbc-link1

Database connection

Driver class: org.postgresql.Driver
Connection String: jdbc:postgresql://192.168.56.110:5432/busuanzidb
Username: busuanzi
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#

SQL Dialect

Identifier enclose:
New link was successfully created with validation status OK and name busuanzi-jdbc-link1

创建HDFS连接明细

sqoop:000> create link -connector hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: busuanzi-hdfs-link1

HDFS cluster

URI: hdfs://localhost:9000
Conf directory: /opt/hadoop/hadoop-3.1.0/etc/hadoop/
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name busuanzi-hdfs-link1

创建job明细

sqoop:000> create job -f "busuanzi-jdbc-link1" -t "busuanzi-hdfs-link1"
Creating job for links with from name busuanzi-jdbc-link1 and to name busuanzi-hdfs-link1
Please fill following values to create new job object
Name: jdbc2hdfs1

Database source

Schema name: public
Table name: top_n_url
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:

Incremental read

Check column:
Last value:

Target configuration

Override null value:
Null value:
File format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec:
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /sqoop2/jdbc2hdfs1
Append mode:

Throttling resources

Extractors: 1
Loaders: 0

Classpath configuration

Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK  and name jdbc2hdfs1

Sqoop2示例数据从postgreSQL导入到HDFS