我是mapreduce的新手,我正在尝试从两个不同的csv文件中连接两种不同类型的行。
Map是好的,我加载两个文件a和b,我匹配的行,我想用同一个键。
在减速机里,我有一种很奇怪的行为,我无法理解。以开头的行 accident#
从b开始的线是 meteo#
. 我想确定一行是来自a还是b,然后获取该行的其余部分,但是当我测试这段代码时
for(Text val : values){
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
context.write(key,new Text(comparable));
}
我收到以下输出,这是正常的
2015-12-31;X meteo
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
那我就这么做
for(Text val : values){
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
if (comparable.equals("meteo"))
comparable = line.nextToken();
context.write(key,new Text(comparable));
}
2015-12-31;X ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
这也可以。然后我做下面的事情来存储meteo
String meteo;
for(Text val : values){
meteo = "hi";
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
if (comparable.equals("meteo"))
meteo = line.nextToken();
context.write(key,new Text(meteo));
}
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
当预期的结果是
2015-12-31;X ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
这是我的问题的简化,但它显示了一个非常奇怪的行为。实际上,我想用相同的键将meteo行附加到每个意外行,这是我的最终目标,但是如果这不起作用。。。我不知道该怎么做(我的想法是获取meteo行,存储它,然后将它附加到每个意外行)
编辑
接下来,我将添加Map器的代码和确切的输入,以澄清问题
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer lines = new StringTokenizer(value.toString(), "\n");
while (lines.hasMoreTokens()){
StringTokenizer line = new StringTokenizer(lines.nextToken(),";");
String csvLine = new String(); //this will be the output value
String atr = line.next.Token(); //with the first atribute i will diferenciate between meteo and accidents
boolean isMeteo = false;
if(atr.equals("0201X")) isMeteo=true;
if(!isMeteo){ //if is a accident line, I search the atributs to put the date in the key (i==6,7,8)
int i=1;
csvLine=atr;
while(line.hasMoreTokens()){
String aux= line.nextToken();
csvLine+=";"+aux;
if(i==6) id =aux;
else if(i==7 || i==8){
int x = Integer.parseInt(aux);
if(x<10)aux = "0"+aux;
id+="-"+aux;
}
else if(i==13){ //this is the X in the key, that is for identify the meteo station (this is not important in my problem)
aux = aux.substring(0,aux.length()-1);
id+=";"+aux;
csvLine= csvLine.substring(0,csvLine.length()-1);
}
++i;
}
}
else if(isMeteo){
id = line.nextToken(); //in the second column we have the complete date string
id+=";X"; //this file has the data of the meteo station X
csvLine+=";"+toCsvLine(line);
}
Text outKey = new Text(id);
Text ouyKey = new Text(csvLine);
context.write(outKey,outValue);
}
public String toCsvLine(StringTokenizer st){
String x = new String();
x = st.nextToken();
while(st.hasMoreTokens()){
x+=";"+st.nextToken();
}
return x;
}
在事故文件中,我取列来生成日id(年-月-日),而在meteo文件中,我只取包含id的所有日期的列。在csvline中,我有我想要的csv行。然后我写下键(id)和值(csvline)。
这里我们有输入数据(仅2天,代表性示例):
meteox.csv格式:
0201X;2015-12-30;18.6;14:50;12.2;07:00;;26;13:20;17;13:10;;0;;;
0201X;2015-12-31;17.8;14:00;9.1;04:40;;25;12:20;19;19:00;;;0;0;0
事故.csv:
2015S009983;Ciutat Vella;la Barceloneta;Mar;Dc;Laboral;2015;12;30;22;Altres;4581220,92;432258,31;X
2015S009984;Sant Mart�;Sant Mart� de Proven�als;Cant�bria;Dc;Laboral;2015;12;30;20;Col.lisi� fronto-lateral;4585862,62;433330,95;X
2015S009985;Eixample;la Nova Esquerra de l'Eixample;Cal�bria;Dj;Laboral;2015;12;31;00;Caiguda (dues rodes);4582094,15;428800,57;X
2015S009987;Eixample;la Dreta de l'Eixample;Gr�cia;Dj;Laboral;2015;12;31;02;Col.lisi� lateral;4582944,96;430133,41;X
2015S009988;Eixample;la Nova Esquerra de l'Eixample;Arag�;Dj;Laboral;2015;12;31;07;Abast;4581873,45;429312,63;X
2015S009989;Ciutat Vella;la Barceloneta;Mar�tim de la Barceloneta;Dj;Laboral;2015;12;31;08;Abast;4581518,06;432606,87;X
暂无答案!
目前还没有任何答案,快来回答吧!