我已经将json数据加载到我的hdfs中,我在mysql数据库中创建了包含必需列的表,如下所示。
如何使用行格式化程序创建表以接受json?
我的hdfs数据
{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani@gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani@gmail.com"
},
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"tomhanks@gmail.com"
}
]
}
我的sql表结构
mysql> create table employee(userid int,jobTitleName varchar(20),firstName varchar(20),lastName varchar(20),preferrredFullName varchar(20),employeeCode varchar(20),region varchar(20),phoneNumber varchar(20), emailAddress varchar(20),modifiedDate timestamp DEFAULT CURRENT_TIMESTAMP);
mysql> desc employee;
+--------------------+-------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+-------------------+-------+
| userid | int(11) | YES | | NULL | |
| jobTitleName | varchar(20) | YES | | NULL | |
| firstName | varchar(20) | YES | | NULL | |
| lastName | varchar(20) | YES | | NULL | |
| preferrredFullName | varchar(20) | YES | | NULL | |
| employeeCode | varchar(20) | YES | | NULL | |
| region | varchar(20) | YES | | NULL | |
| phoneNumber | varchar(20) | YES | | NULL | |
| emailAddress | varchar(20) | YES | | NULL | |
| modifiedDate | timestamp | NO | | CURRENT_TIMESTAMP | |
+--------------------+-------------+------+-----+-------------------+-------+
10 rows in set (0.00 sec)
我尝试使用sqoop export将上表的数据从hdfs加载到mysql,如下所示
sqoop export --connect jdbc:mysql://localhost/emp_scheme --username root --password adithyan --table employee --export-dir /user/adithyan/filesystem/employee.txt
它以以下例外结束
17/02/18 19:35:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/02/18 19:35:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/02/18 19:35:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/02/18 19:35:35 INFO tool.CodeGenTool: Beginning code generation
17/02/18 19:35:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
17/02/18 19:35:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
17/02/18 19:35:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/adithyan/hadoop_dir/hadoop-1.2.1
Note: /tmp/sqoop-adithyan/compile/35afadf151a1dd1626a3658577cbc2dd/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/18 19:35:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-adithyan/compile/35afadf151a1dd1626a3658577cbc2dd/employee.jar
17/02/18 19:35:41 INFO mapreduce.ExportJobBase: Beginning export of employee
17/02/18 19:35:45 INFO input.FileInputFormat: Total input paths to process : 1
17/02/18 19:35:45 INFO input.FileInputFormat: Total input paths to process : 1
17/02/18 19:35:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
17/02/18 19:35:45 WARN snappy.LoadSnappy: Snappy native library not loaded
17/02/18 19:35:46 INFO mapred.JobClient: Running job: job_201702181051_0002
17/02/18 19:35:47 INFO mapred.JobClient: map 0% reduce 0%
17/02/18 19:36:17 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
... 12 more
17/02/18 19:36:18 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_0, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '{'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: "{"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
... 12 more
17/02/18 19:36:29 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
... 12 more
17/02/18 19:36:29 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_1, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '{'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: "{"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
... 12 more
17/02/18 19:36:42 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
... 12 more
17/02/18 19:36:42 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_2, Status : FAILED
java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Can't parse input data: '{'
at employee.__loadFromFields(employee.java:596)
at employee.parse(employee.java:499)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException: For input string: "{"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at employee.__loadFromFields(employee.java:548)
有人能帮我吗?
1条答案
按热度按时间wsewodh21#
你可能需要考虑多种选择。。json_set/replace/insert-这些选项可能还不受sqoop的直接支持。
另一种选择是使用pig预处理数据,然后在sqooping到rdbms之前将数据放在hdfs中。