下面是我试图执行的脚本。我已经注册了我的udfjar文件 hotornot_09_01_14_second.jar
. 后来我试着直接调用它。注意在这个例子中我没有使用 DEFINE
声明。不幸的是如果我用同样的方法 DEFINE
我得到了相同的错误,但是不是“null”而是“regex.txt”。
REGISTER '/somepath/piggybank.jar';
REGISTER '/somepath/mysql-connector-java-5.1.18-bin.jar';
REGISTER '/somepath/hotornot_09_01_14_second.jar';
--DEFINE GenerateVenueUDF com.anton.hadoop.pig.production.GenerateVenueUDF('venues_regex.txt');
venues = LOAD 'venues_extended_2.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Name:chararray, Type:chararray, Latitude:double, Longitude:double, City:chararray, Country:chararray);
tweets = LOAD 'tweets_extended.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Text:chararray, WeekDay:chararray, Day:int, Time:chararray, SMT:chararray, Year:int, Location:chararray, Language:chararray, Followers_count:int, Friends_count:int);
tweetsReduced = foreach tweets generate Text;
venuesTweets = foreach tweetsReduced generate *, com.anton.hadoop.pig.production.GenerateVenueUDF(Text);
venueCounts = FOREACH (GROUP venuesTweets BY $1) GENERATE group, COUNT($1) as counter;
venueCountsOrdered = order venueCounts by counter;
--DUMP venueCountsOrdered;
STORE venueCountsOrdered INTO 'VenueData' USING org.apache.pig.piggybank.storage.DBStorage(some connection details);
我得到了这个错误 ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. could not instantiate 'com.anton.hadoop.pig.production.GenerateVenueUDF' with arguments 'null'
以下是我的自定义项:
package com.anton.hadoop.pig.production;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.UDFContext;
public class GenerateVenueUDF extends EvalFunc<String> {
private String regex;
private static Pattern p;
public GenerateVenueUDF() throws IOException {
String fileName = "venues_regex.txt";
FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
Scanner sc = new Scanner(fs.open(new Path(fileName)));
regex = sc.nextLine(); // should be one line only !!!
p = Pattern.compile(regex);
sc.close();
}
@Override
public String exec(Tuple tuple) throws IOException {
// expect one string
if (tuple == null) {
throw new IllegalArgumentException(
"BagTupleExampleUDF: requires at least one input parameter.");
}
try {
String tweet = (String) tuple.get(0);
// TupleFactory tf = TupleFactory.getInstance();
// BagFactory mBagFactory = BagFactory.getInstance();
// Tuple t = tf.newTuple();
// t.append(tweet);
// t.append(checkVenue(tweet));
// DataBag output = mBagFactory.newDefaultBag();
// output.add(t);
return checkVenue(tweet);
} catch (Exception e) {
throw new IOException(
"BagTupleExampleUDF: caught exception processing input.", e);
}
}
public static String checkVenue(String tweet) {
Matcher m = p.matcher(tweet);
if (m.find()) {
return m.group(1);
} else {
return "";
}
}
}
在本例中,构造函数不接受任何参数,但如我前面所述,如果我尝试 DEFINE
udf和pass fileName
作为一个参数,我仍然得到类似的错误。有人能帮我解决这个错误吗。欢迎您提出任何建议,谢谢!
1条答案
按热度按时间ogq8wdun1#
当udf示例化期间发生异常时,就会发生这种情况。构造器可能出了问题。
我要么把一些日志记录到您的构造函数中,要么用pigunit构建一个单元测试来找出哪里出了问题。