我正在尝试通过shell脚本使用sqoop从mysql导入数据。
工作流程步骤:1。删除任何现有目录。
java操作读取元数据配置单元表并创建表\元数据目录和*.cf文件。
shell脚本遍历表\元数据目录并扫描配置文件(*.cf)。每个文件都包含一个要导入的表名。然后将表名抓取到表名变量中,该变量用于sqoop导入查询。
当我从命令行(shscript.sh)运行包含sqoop的同一个脚本时,它可以正常工作。
但是,当我尝试通过oozie(cloudera hue gui)脚本操作作为工作流运行时,它失败了,并出现以下错误。
你知道为什么oozie的工作失败了吗?
shell脚本:
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path
done
fi
工作流.xml
<workflow-app name="RDB2Hive" xmlns="uri:oozie:workflow:0.5">
<start to="fs-1051"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="fs-1051">
<fs>
<delete path='${nameNode}/user/cloudera/workflow/table_metadata'/>
<mkdir path='${nameNode}/user/cloudera/workflow/table_metadata'/>
</fs>
<ok to="java-9025"/>
<error to="Kill"/>
</action>
<action name="java-9025">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.rd2h.app.LoadMetaData</main-class>
<arg>load_metadata</arg>
<arg>/user/cloudera/workflow/table_metadata</arg>
</java>
<ok to="shell-d3bf"/>
<error to="Kill"/>
</action>
<action name="shell-d3bf">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>import_script.sh</exec>
<file>/user/cloudera/workflow/scripts/import_script.sh#import_script.sh</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
mr错误日志:
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/cloudera/.staging/job_1486009475788_0032/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1580)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1444)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1402)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1333)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1101)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1540)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1536)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1469)
***Caused by: java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/cloudera/.staging/job_1486009475788_0032/job.splitmetainfo***
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1219)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1575)
oozie错误日志:
Stdoutput 2017-02-01 20:57:31,101 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.6-cdh5.8.0
Stdoutput 2017-02-01 20:57:31,113 WARN [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1042)) - Setting your password on the command-line is insecure. Consider using -P instead.
Stdoutput 2017-02-01 20:57:31,304 INFO [main] manager.MySQLManager (MySQLManager.java:initOptionDefaults(71)) - Preparing to use a MySQL streaming resultset.
Stdoutput 2017-02-01 20:57:31,309 INFO [main] tool.CodeGenTool (CodeGenTool.java:generateORM(92)) - Beginning code generation
Stdoutput 2017-02-01 20:57:31,560 INFO [main] manager.SqlManager (SqlManager.java:execute(776)) - Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
Stdoutput 2017-02-01 20:57:31,579 INFO [main] manager.SqlManager (SqlManager.java:execute(776)) - Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
Stdoutput 2017-02-01 20:57:31,582 INFO [main] orm.CompilationManager (CompilationManager.java:findHadoopJars(94)) - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Stdoutput 2017-02-01 20:57:32,587 INFO [main] orm.CompilationManager (CompilationManager.java:jar(330)) - Writing jar file: /tmp/sqoop-yarn/compile/94cbe03d9d51f6ccc47ddd3ca98032be/departments.jar
Stdoutput 2017-02-01 20:57:33,182 INFO [main] tool.ImportTool (ImportTool.java:deleteTargetDir(544)) - Destination directory hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp is not present, hence not deleting.
Stdoutput 2017-02-01 20:57:33,187 INFO [main] manager.DirectMySQLManager (DirectMySQLManager.java:importTable(83)) - Beginning mysqldump fast path import
Stdoutput 2017-02-01 20:57:33,187 INFO [main] mapreduce.ImportJobBase (ImportJobBase.java:runImport(242)) - Beginning import of departments
Stdoutput 2017-02-01 20:57:33,188 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
Stdoutput 2017-02-01 20:57:33,203 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.jar is deprecated. Instead, use mapreduce.job.jar
Stdoutput 2017-02-01 20:57:33,210 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
Stdoutput 2017-02-01 20:57:33,253 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at localhost/127.0.0.1:8032
Stdoutput 2017-02-01 20:57:35,040 INFO [main] db.DBInputFormat (DBInputFormat.java:setTxIsolation(192)) - Using read commited transaction isolation
Stdoutput 2017-02-01 20:57:35,072 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(202)) - number of splits:1
Stdoutput 2017-02-01 20:57:35,190 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(291)) - Submitting tokens for job: job_1486009475788_0032
Stdoutput 2017-02-01 20:57:35,190 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(293)) - Kind: mapreduce.job, Service: job_1486009475788_0029, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@76f3da25)
Stdoutput 2017-02-01 20:57:35,198 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(293)) - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1486011413559, maxDate=1486616213559, sequenceNumber=67, masterKeyId=2)
Stdoutput 2017-02-01 20:57:35,439 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(260)) - Submitted application application_1486009475788_0032
Stdoutput 2017-02-01 20:57:35,463 INFO [main] mapreduce.Job (Job.java:submit(1311)) - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1486009475788_0032/
Stdoutput 2017-02-01 20:57:35,463 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356)) - Running job: job_1486009475788_0032
Stdoutput 2017-02-01 20:57:41,569 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1377)) - Job job_1486009475788_0032 running in uber mode : false
Stdoutput 2017-02-01 20:57:41,569 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1384)) - map 0% reduce 0%
Stdoutput 2017-02-01 20:57:41,682 INFO [main] mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(277)) - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
Stdoutput 2017-02-01 20:57:41,717 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1397)) - Job job_1486009475788_0032 failed with state FAILED due to:
Stdoutput 2017-02-01 20:57:41,725 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(393)) - The MapReduce job has already been retired. Performance
Stdoutput 2017-02-01 20:57:41,725 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(394)) - counters are unavailable. To get this information,
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(395)) - you will need to enable the completed job store on
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(396)) - the jobtracker with:
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(397)) - mapreduce.jobtracker.persist.jobstatus.active = true
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(398)) - mapreduce.jobtracker.persist.jobstatus.hours = 1
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(399)) - A jobtracker restart is required for these settings
Stdoutput 2017-02-01 20:57:41,726 INFO [main] mapreduce.ImportJobBase (JobBase.java:displayRetiredJobNotice(400)) - to take effect.
Stdoutput 2017-02-01 20:57:41,726 ERROR [main] tool.ImportTool (ImportTool.java:run(631)) - Error during import: Import job failed!
Exit code of the Shell command 1
<<< Invocation of Shell command completed <<<
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000013-170201202514643-oozie-oozi-W/shell-d3bf--shell/action-data.seq
Oozie Launcher ends
1条答案
按热度按时间osh3o9ms1#
您可能需要为shell操作设置环境变量:
另外,似乎您正在导入多个表,因此您可能希望在每个表的目标目录下创建一个子目录。