Pig拉丁-没有显示正确的记录数字

uplii1fm  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(337)

我写了一封信 pig script 为了 wordcount 效果很好。我可以在hdfs的输出目录中看到pig脚本的结果。但在我的控制台的最后,我看到了以下内容:

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTIme  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_local1695568121_0002    1   1   0   0   0   0   0   0   0   0   words_sorted    SAMPLER 
job_local2103470491_0003    1   1   0   0   0   0   0   0   0   0   words_sorted    ORDER_BY    /output/result_pig,
job_local696057848_0001 1   1   0   0   0   0   0   0   0   0   book,words,words_agg,words_grouped  GROUP_BY,COMBINER   

Input(s):
Successfully read 0 records from: "/data/pg5000.txt"

Output(s):
Successfully stored 0 records in: "/output/result_pig"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local696057848_0001 ->  job_local1695568121_0002,
job_local1695568121_0002    ->  job_local2103470491_0003,
job_local2103470491_0003

2014-07-01 14:10:35,241 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

正如你所看到的,这份工作是成功的。但不是那个 Input(s) 以及 output(s) . 他们都说 successfully read/stored 0 records 计数器值都是0。为什么值是零。这些不应该是零。
我正在使用 hadoop2.2 and pig-0.12 以下是脚本:

book = load '/data/pg5000.txt' using PigStorage() as (lines:chararray);
words = foreach book generate FLATTEN(TOKENIZE(lines)) as word;
words_grouped = group words by word;
words_agg = foreach words_grouped generate group as word, COUNT(words);
words_sorted = ORDER words_agg BY $1 DESC;
STORE words_sorted into '/output/result_pig' using PigStorage(':','-schema');

注意:我的数据显示在 /data/pg5000.txt 而不是在默认目录中 /usr/name/data/pg5000.txt 编辑:这里是将我的文件打印到控制台的输出

hadoop fs -cat /data/pg5000.txt | head -10
The Project Gutenberg EBook of The Notebooks of Leonardo Da Vinci, Complete
by Leonardo Da Vinci
(#3 in our series by Leonardo Da Vinci)

Copyright laws are changing all over the world. Be sure to check the
copyright laws for your country before downloading or redistributing
this or any other Project Gutenberg eBook.

This header should be the first thing seen when viewing this Project
Gutenberg file.  Please do not remove it.  Do not change or edit the
cat: Unable to write to output stream.
r8xiu3jd

r8xiu3jd1#

请更正下面这行

book = load '/data/pg5000.txt' using PigStorage() as (lines:chararray);

book = load '/data/pg5000.txt' using PigStorage(',') as (lines:chararray);

我假设这里的分隔符是逗号,使用分隔文件中记录的分隔符。这将解决问题
另请注意--
如果没有提供参数,pigstorage将采用tab分隔格式。如果提供了分隔符参数,则它必须是单字节字符;任何文本(例如:“a”,“|”),已知转义字符(例如:“\t”,“\r”)都是有效的分隔符。

相关问题