加载由双冒号分隔的文件::in pig

mzsu5hc0  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(256)

下面是由双冒号(::)分隔的示例数据集。

1::Toy Story (1995)::Animation|Children's|Comedy

我想从上面的数据集中提取三个字段,分别是movieid、title和genre。我已经为此编写了以下代码

movies = LOAD 'location/of/dataset/on/hdfs ' 
using PigStorage('::')
as 
(MovieID:int,title:chararray,genre:chararray);

但我有以下错误

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to  parse:  
 <file script.pig, line 1, column 9> pig script failed to validate:
 java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[::]'
44u64gxh

44u64gxh1#

使用myregexloader:您需要piggybank.jar。

REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('([^\\:]+)::([^\\:]+)::([^\\:]+)') 
      as (movieid:int, title:chararray, genre:chararray);

输出:
(1、《玩具总动员》(1995)、《动画》《儿童喜剧》)

相关问题