I'm trying to export a HDFS to MYSQL database. I found various different solution but none of them worked, I even tried to remove the WINDOWS-1251 chars from the file.
As a small summary - I'm using virtualbox with Hortonworks image for this operations.
My HIVE in the default database:
CREATE EXTERNAL TABLE `airqualitydata`(
`sensor_id` VARCHAR(100),
`sensor_type` VARCHAR(100),
`location` VARCHAR(100),
`lat` VARCHAR(100),
`lon` VARCHAR(100),
`timestamp` timestamp,
`p1` VARCHAR(100),
`durp1` VARCHAR(100),
`ratiop1` VARCHAR(100),
`p2` VARCHAR(100),
`durp2` VARCHAR(100),
`ratiop2` VARCHAR(100))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/hadoop/airqualitydata'
TBLPROPERTIES ("skip.header.line.count"="1");
The file contained in /hadoop/airqualitydata HDFS (removed the win1251 chars just to be sure).
- Note that this data can be visualized by querying
SELECT * FROM airqualitydata
in the hive.*
sensor_id;sensor_type;location;lat;lon;timestamp;P1;durP1;ratioP1;P2;durP2;ratioP2
9710;SDS011;4894;43.226;27.934;2021-09-09T00:00:12;70;;;20;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:02:41;83;;;0.93;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:05:14;0.80;;;0.73;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:07:42;0.50;;;0.50;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:10:10;57;;;0.80;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:12:39;0.40;;;0.40;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:15:07;0.70;;;0.70;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:17:35;2;;;0.47;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:20:04;90;;;0.63;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:22:34;0.57;;;0.57;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:25:01;0.73;;;0.60;;
MYSQL DB & TABLE:
CREATE DATABASE airquality CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE TABLE `airqualitydata`(
`sensor_id` VARCHAR(100),
`sensor_type` VARCHAR(100),
`location` VARCHAR(100),
`lat` VARCHAR(100),
`lon` VARCHAR(100),
`timestamp` timestamp,
`p1` VARCHAR(100),
`durp1` VARCHAR(100),
`ratiop1` VARCHAR(100),
`p2` VARCHAR(100),
`durp2` VARCHAR(100),
`ratiop2` VARCHAR(100)
);
SQOOP CLI call:
sqoop export --connect "jdbc:mysql://localhost:3306/airquality?useUnicode=true&characterEncoding=WINDOWS-1251" --username root --password hortonworks1 --export-dir hdfs://sandbox-hdp.hortonworks.com:8020/hadoop/airqualitydata --table airqualitydata --input-fields-terminated-by "\073" --input-lines-terminated-by "\n" -m 1
I removed the ?useUnicode=true&characterEncoding=WINDOWS-1251
with no success. I also cannot access the log from the URL given in the terminal, so I got only this as failure:
Warning: /usr/hdp/2.6.5.0-292/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
21/09/12 04:04:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
21/09/12 04:04:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/09/12 04:04:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/09/12 04:04:40 INFO tool.CodeGenTool: Beginning code generation
Sun Sep 12 04:04:40 UTC 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
21/09/12 04:04:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `airqualitydata` AS t LIMIT 1
21/09/12 04:04:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `airqualitydata` AS t LIMIT 1
21/09/12 04:04:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-raj_ops/compile/41fba9933b913b974b70403656a13287/airqualitydata.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
21/09/12 04:04:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-raj_ops/compile/41fba9933b913b974b70403656a13287/airqualitydata.jar
21/09/12 04:04:42 INFO mapreduce.ExportJobBase: Beginning export of airqualitydata
21/09/12 04:04:43 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.18.0.2:8032
21/09/12 04:04:43 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.18.0.2:10200
21/09/12 04:04:50 INFO input.FileInputFormat: Total input paths to process : 1
21/09/12 04:04:50 INFO input.FileInputFormat: Total input paths to process : 1
21/09/12 04:04:50 INFO mapreduce.JobSubmitter: number of splits:1
21/09/12 04:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1631399426919_0028
21/09/12 04:04:51 INFO impl.YarnClientImpl: Submitted application application_1631399426919_0028
21/09/12 04:04:51 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1631399426919_0028/
21/09/12 04:04:51 INFO mapreduce.Job: Running job: job_1631399426919_0028
21/09/12 04:04:59 INFO mapreduce.Job: Job job_1631399426919_0028 running in uber mode : false
21/09/12 04:04:59 INFO mapreduce.Job: map 0% reduce 0%
21/09/12 04:05:03 INFO mapreduce.Job: map 100% reduce 0%
21/09/12 04:05:04 INFO mapreduce.Job: Job job_1631399426919_0028 failed with state FAILED due to: Task failed task_1631399426919_0028_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
21/09/12 04:05:04 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2840
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2840
Total vcore-milliseconds taken by all map tasks=2840
Total megabyte-milliseconds taken by all map tasks=710000
21/09/12 04:05:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/09/12 04:05:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 21.2627 seconds (0 bytes/sec)
21/09/12 04:05:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
21/09/12 04:05:04 INFO mapreduce.ExportJobBase: Exported 0 records.
21/09/12 04:05:04 ERROR mapreduce.ExportJobBase: Export job failed!
21/09/12 04:05:04 ERROR tool.ExportTool: Error during export: Export job failed!
Any directions will be helpful, Thanks!
EDIT #1: As per the comments above, using:
sqoop export --connect jdbc:mysql://localhost:3306/airquality --table airqualitydata --username root --password hortonworks1 --hcatalog-database default --hcatalog-table airqualitydata --verbose
or basically (for people reproducing)
sqoop export --connect jdbc:mysql://<host:port>/<mysql db> --table <mysql table> --username <mysql_user> --password <mysqlpass> --hcatalog-database <hive_db> --hcatalog-table <hive_table> --verbose
I got it to put the data in the MYSQL. However it is putting the header row as well. Also when ran twice (I believe it should overwrite the data) it results in the data been in the table twice.
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
| sensor_id | sensor_type | location | lat | lon | timestamp | p1 | durp1 | ratiop1 | p2 | durp2 | ratiop2 |
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
| sensor_id | sensor_type | location | lat | lon | 2021-09-12 05:55:49 | P1 | durP1 | ratioP1 | P2 | durP2 | ratioP2 |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 70 | | | 20 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 83 | | | 0.93 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.80 | | | 0.73 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.50 | | | 0.50 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 57 | | | 0.80 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.40 | | | 0.40 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.70 | | | 0.70 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 2 | | | 0.47 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 90 | | | 0.63 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.57 | | | 0.57 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.73 | | | 0.60 | | |
| sensor_id | sensor_type | location | lat | lon | 2021-09-12 05:58:02 | P1 | durP1 | ratioP1 | P2 | durP2 | ratioP2 |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 70 | | | 20 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 83 | | | 0.93 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.80 | | | 0.73 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.50 | | | 0.50 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 57 | | | 0.80 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.40 | | | 0.40 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.70 | | | 0.70 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 2 | | | 0.47 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 90 | | | 0.63 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.57 | | | 0.57 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.73 | | | 0.60 | | |
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
The data in HIVE is okay (no header row in there). What might cause this?
Also I have an exception but it completed overall, is this important?
21/09/12 05:57:41 INFO mapreduce.Job: Running job: job_1631399426919_0035
21/09/12 05:57:55 INFO mapreduce.Job: Job job_1631399426919_0035 running in uber mode : false
21/09/12 05:57:55 INFO mapreduce.Job: map 0% reduce 0%
21/09/12 05:58:03 INFO mapreduce.Job: map 100% reduce 0%
21/09/12 05:58:05 INFO mapreduce.Job: Job job_1631399426919_0035 completed successfully
21/09/12 05:58:06 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=345759
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2597
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4966
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4966
Total vcore-milliseconds taken by all map tasks=4966
Total megabyte-milliseconds taken by all map tasks=1241500
Map-Reduce Framework
Map input records=12
Map output records=12
Input split bytes=1800
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=211
CPU time spent (ms)=3490
Physical memory (bytes) snapshot=217477120
Virtual memory (bytes) snapshot=1972985856
Total committed heap usage (bytes)=51380224
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Transferred 2.5361 KB in 62.3328 seconds (41.6635 bytes/sec)
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Exported 12 records.
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Publishing HCatalog export job data to Listeners
21/09/12 05:58:06 WARN mapreduce.PublishJobData: Unable to publish export data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:457)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
21/09/12 05:58:06 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@4232c52b
1条答案
按热度按时间rkue9o1l1#
解你的第一个问题--
--hcatalog-database mydb --hcatalog-table airquality
并去掉--export dir
参数。Sqoop导出无法替换数据。请在加载主表之前发出sqoop eval语句以截断它。
您也可以使用update语句来更新表。https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
现在,对于你的标题问题,我认为原始表可能有标题行。我不确定原始表中的数据。检查源表是否在配置单元中正确定义。