如何在pig中导入/加载.csv文件?

eagi6jfj  于 2021-05-30  发布在  Hadoop
关注(0)|答案(7)|浏览(502)

假设有一个文本文件选项卡limited(datetemp.txt),我想把这个文本文件加载到pig中进行处理,但是当我在下面一行键入时,它给我的错误是:
grunt>inputfile=load'/training/pig/datetemp.txt'使用pigstorage()作为(eventid:chararray,eventdate:chararray,count:int);
grunt>转储输入文件;
2014-09-06 08:41:23527[main]info org.apache.pig.tools.pigstats.scriptstate-脚本中使用的pig功能:未知2014-09-06 08:41:23544[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler-文件连接阈值:100?false 2014-09-06 08:41:23548[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer-优化前mr计划大小:1 2014-09-06 08:41:23548[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer-优化后mr计划大小:1 2014-09-06 08:41:23,551[main]info org.apache.pig.tools.pigstats.scriptstate-pig脚本设置已添加到作业2014-09-06 08:41:23551[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler-mapred.job.reduce.markreset.buffer.percent未设置,设置为默认值0.3 2014-09-06 08:41:23,552[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler-创建jar文件job27391717857739333.jar 2014-09-06 08:42:39608[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler-创建jar文件job27391717857739333.jar 2014-09-06 08:42:39,612[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler-设置单个存储作业2014-09-06 08:42:39619[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-1个map reduce作业正在等待提交。2014-09-06 08:42:39630[thread-12]warn org.apache.hadoop.mapred.jobclient-使用GenericOptions解析器解析参数。应用程序应该实现同样的工具。2014-09-06 08:42:39891[thread-12]info org.apache.hadoop.mapred.jobclient-清理暂存区域hdfs://192.168.195.130:8020/var/lib/hadoop hdfs/cache/mapred/mapred/staging/training/.staging/job\u 201408292336\u 0009 2014-09-06 08:42:39,891[thread-12]error org.apache.hadoop.security.usergroupinformation-priviledgedactionexception as:培训(auth:simple) cause:org.apache.pig.backend.executionengine.execexception:error 2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt 2014-09-06 08:42:40,119[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-完成0%2014-09-06 08:42:40125[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-作业null失败!停止运行所有相关作业2014-09-06 08:42:40125[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-100%完成2014-09-06 08:42:40,131[main]error org.apache.pig.tools.pigstats.simplepgists-错误2997:无法从后端重新创建异常错误:org.apache.pig.backend.executionengine.execute:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txtorg.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat)。java:285)在org.apache.hadoop.mapred.jobclient.writenewsplits(jobclient。java:1014)在org.apache.hadoop.mapred.jobclient.writeslits(jobclient。java:1031)访问org.apache.hadoop.mapred.jobclient.access$600(jobclient。java:172)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:943)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:896)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:396)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1332)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:896)在org.apache.hadoop.mapreduce.job.submit(作业。java:531)在org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob。java:318)在org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.startreadyjobs(jobcontrol。java:238)在org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.run(jobcontrol。java:269)在java.lang.thread.run(线程。java:662)在org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher。java:260)原因:org.apache.hadoop.mapreduce.lib.input.invalidinputexception:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt位于org.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat)。java:231)位于org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat)。java:36)在org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat)。java:248)位于org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat。java:273) ... 还有15个
2014-09-06 08:42:40131[main]error org.apache.pig.tools.pigstats.pigstattil-1 map reduce作业失败!2014-09-06 08:42:40135[main]信息org.apache.pig.tools.pigstats.simplepigstats-脚本统计:
hadoopversion pigversion userid在finishedat启动功能2.0.0-cdh4.1.1 0.10.0-cdh4.1.1培训2014-09-06 08:41:23 2014-09-06 08:42:40未知
失败!
失败的作业:jobid alias feature message outputs n/a inputfile map\u only message:org.apache.pig.backend.executionengine.execute:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt位于org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat)。java:285)在org.apache.hadoop.mapred.jobclient.writenewsplits(jobclient。java:1014)在org.apache.hadoop.mapred.jobclient.writeslits(jobclient。java:1031)访问org.apache.hadoop.mapred.jobclient.access$600(jobclient。java:172)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:943)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:896)在javax.security.auth.subject.doas(主题)中的java.security.accesscontroller.doprivileged(本机方法)。java:396)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1332)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:896)在org.apache.hadoop.mapreduce.job.submit(作业。java:531)在org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob。java:318)在org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.startreadyjobs(jobcontrol。java:238)在org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.run(jobcontrol。java:269)在java.lang.thread.run(线程。java:662)在org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher。java:260)原因:org.apache.hadoop.mapreduce.lib.input.invalidinputexception:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txtorg.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat)。java:231)位于org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat)。java:36)位于org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat)。java:248)在org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat)。java:273) ... 还有15个hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,
输入:无法从“/training/pig/datetemp.txt”读取数据
输出:未能在“”中生成结果hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785“
计数器:写入的记录总数:写入的字节总数:0可溢出内存管理器溢出计数:0主动溢出的包总数:0主动溢出的记录总数:0
作业dag:空
2014-09-06 08:42:40135[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-失败!2014-09-06 08:42:40142[main]error org.apache.pig.tools.grunt.grunt-错误1066:无法在logfile:/home/training/pig_.log打开别名输入文件详细信息的迭代器
请帮帮我。。!!

e5nqia27

e5nqia271#

hdfs://192.168.195.130:8020/training/pig/datetemp.txt

在您的hdfs中找不到文件!!确保输入文件放置在上述位置。

goqiplq2

goqiplq22#

存储区分大小写。使用pigstorage而不是pigstorage。

rqqzpn5f

rqqzpn5f3#

你检查过输入路径是否存在吗?
尝试: fs -ls /training/pig/ in Grunt Shell 如果它在列表中显示datetemp.txt,则它将工作,否则将提供正确的输入路径

brgchamk

brgchamk4#

您可以在pigstorage类中指定“,”来读取csv文件。
查询如下所示:

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);

grunt> dump inputfile;

并确保hdfs上有文件'/training/pig/datetemp.txt'。要测试运行: hadoop fs -ls /training/pig/datetemp.txt

qrjkbowd

qrjkbowd5#

日志清楚地说明了错误。
org.apache.pig.backend.executionengine.execute:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt
你能检查文件是否存在于hdfs中吗?您也可以检查您的清管器是否在mapreduce模式或本地模式下运行。

sh7euo9m

sh7euo9m6#

为什么不编写pigstorage('\t')正如您已经提到的,您有制表符分隔的文件,而不是pigstorage()
提及的代码-
grunt>inputfile=load'/training/pig/datetemp.txt'使用pigstorage()作为(eventid:chararray,eventdate:chararray,count:int);
也许这能解决你的问题。
如果是别的事就告诉我。

gijlo24d

gijlo24d7#

你的问题标题说你试图加载一个csv文件。为此,我有幸 using org.apache.pig.piggybank.storage.CSVExcelStorage() 在我的 LOAD 陈述如下:https://martin.atlassian.net/wiki/x/wybmaq.

相关问题