error org.apache.pig.pigserver-解析期间异常:解析期间出错无法示例化

1hdlvixo  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(339)

下面是我试图执行的脚本。我已经注册了我的udfjar文件 hotornot_09_01_14_second.jar . 后来我试着直接调用它。注意在这个例子中我没有使用 DEFINE 声明。不幸的是如果我用同样的方法 DEFINE 我得到了相同的错误,但是不是“null”而是“regex.txt”。

REGISTER '/somepath/piggybank.jar';
REGISTER '/somepath/mysql-connector-java-5.1.18-bin.jar';
REGISTER '/somepath/hotornot_09_01_14_second.jar';

--DEFINE GenerateVenueUDF com.anton.hadoop.pig.production.GenerateVenueUDF('venues_regex.txt');

venues = LOAD 'venues_extended_2.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Name:chararray, Type:chararray, Latitude:double, Longitude:double, City:chararray, Country:chararray);
tweets = LOAD 'tweets_extended.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Text:chararray, WeekDay:chararray, Day:int, Time:chararray, SMT:chararray, Year:int, Location:chararray, Language:chararray, Followers_count:int, Friends_count:int);

tweetsReduced = foreach tweets generate Text;

venuesTweets = foreach tweetsReduced generate *, com.anton.hadoop.pig.production.GenerateVenueUDF(Text);

venueCounts = FOREACH (GROUP venuesTweets BY $1) GENERATE group, COUNT($1) as counter;
venueCountsOrdered = order venueCounts by counter;

--DUMP venueCountsOrdered;

STORE venueCountsOrdered INTO 'VenueData' USING org.apache.pig.piggybank.storage.DBStorage(some connection details);

我得到了这个错误 ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. could not instantiate 'com.anton.hadoop.pig.production.GenerateVenueUDF' with arguments 'null' 以下是我的自定义项:

package com.anton.hadoop.pig.production;

import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.UDFContext;

public class GenerateVenueUDF extends EvalFunc<String> {
    private String regex;
    private static Pattern p;

    public GenerateVenueUDF() throws IOException {
        String fileName = "venues_regex.txt";
        FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
        Scanner sc = new Scanner(fs.open(new Path(fileName)));
        regex = sc.nextLine(); // should be one line only !!!
        p = Pattern.compile(regex);
        sc.close();
    }

    @Override
    public String exec(Tuple tuple) throws IOException {
        // expect one string
        if (tuple == null) {
            throw new IllegalArgumentException(
                    "BagTupleExampleUDF: requires at least one input parameter.");
        }
        try {
            String tweet = (String) tuple.get(0);
//          TupleFactory tf = TupleFactory.getInstance();
//          BagFactory mBagFactory = BagFactory.getInstance();
//          Tuple t = tf.newTuple();
//          t.append(tweet);
//          t.append(checkVenue(tweet));
//          DataBag output = mBagFactory.newDefaultBag();
//          output.add(t);
            return checkVenue(tweet);
        } catch (Exception e) {
            throw new IOException(
                    "BagTupleExampleUDF: caught exception processing input.", e);
        }
    }

    public static String checkVenue(String tweet) {
        Matcher m = p.matcher(tweet);
        if (m.find()) {
            return m.group(1);
        } else {
            return "";
        }
    }

}

在本例中,构造函数不接受任何参数,但如我前面所述,如果我尝试 DEFINE udf和pass fileName 作为一个参数,我仍然得到类似的错误。有人能帮我解决这个错误吗。欢迎您提出任何建议,谢谢!

ogq8wdun

ogq8wdun1#

当udf示例化期间发生异常时,就会发生这种情况。构造器可能出了问题。
我要么把一些日志记录到您的构造函数中,要么用pigunit构建一个单元测试来找出哪里出了问题。

相关问题