我的hadoop mapper正在以文本对象的形式逐行发送csv,如下所示:
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String dataRows[] = value.toString().trim().split("\\r?\\n");
for (int i = 0; i < dataRows.length; i++) {
Random r = new Random();
int partition = r.nextInt(3);
HC.set(Integer.toString(partition));
Data.set(dataRows[i]);
context.write(HC, Data);
}
}
在我的reducer中,我需要分割csv并使用字符串进行进一步的操作。以下是减速器代码:
public static class IntSumReducer extends Reducer<Text, Text, Text, Text> {
private Text data = new Text();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
List<Coord> coords = new ArrayList<Coord>();
Iterator<Text> iter = values.iterator();
while (iter.hasNext()) {
String[] elems = iter.next().toString().split(",");
double[] x = new double[elems.length];
try {
for (int i = 0; i < x.length; i++) {
x[i] = Integer.parseInt(elems[i]);
}
} catch (Exception e) {
continue;
}
coords.add(new Coord(x));
}
try {
Cluster cluster = runClusterer(coords);
data.set(cluster.toNewick()+";");
context.write(key, data);
} catch (IOException e) {
e.printStackTrace();
}
}
}
奇怪的是,elems字符串数组的长度是1,并且只包含csv中每一行的左边元素。
例如,假设我的csv包含两行。第一排- {1,2}
第二排 {3,4}
elems数组正在填充为 {1,3}
.
感谢您的帮助。
暂无答案!
目前还没有任何答案,快来回答吧!