pig与java.lang.classcastexception的连接:java.lang.string不能转换为java.lang.integer

jjjwad0x  于 2021-06-24  发布在  Pig
关注(0)|答案(2)|浏览(374)

我有两份档案,在 data1 ```
1 3
1 2
5 1

在 `data2` ```
2 3
2 4

然后我试着把它们读成Pig

d1 = LOAD 'data1';
d2 = foreach d1 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
d3 = LOAD 'data2' ;
d4 = foreach d3 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int);
data = join d2 by f1, d4 by f2;

然后我得到了

2013-08-04 00:48:26,032 [Thread-21] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
    at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:85)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

有人能帮我吗?谢谢您。

1tuwyuhd

1tuwyuhd1#

如果您在pig中使用udf并得到这个强制转换异常,那么除了检查pig脚本之外,还要检查udf脚本并确保实际返回的值类型与 @outputSchema 类型。

s6fujrry

s6fujrry2#

首先,我要为输入定义一个简单的模式。基于你的例子,我假设你的输入是文本文件。
现在您得到了classcastexception,因为不幸的是,仅仅应用模式(f1:int,f2:int)不会进行任何转换。您需要显式地强制转换 STRSPLIT(tuple(int,int)) 所以扁平化可以产生 f1:int and f2:int 从它那里。即:

d1 = LOAD 'data1' as (line:chararray);
d2 = foreach d1 generate flatten((tuple(int,int))(STRSPLIT($0, ' +'))) 
       as (f1:int,f2:int);

d3 = LOAD 'data2' as (line:chararray);
d4 = foreach d3 generate flatten((tuple(int,int))(STRSPLIT($0, ' +')))
       as (f1:int,f2:int);

data = join d2 by f1, d4 by f2;

相关问题