cassandra pig示例在启用宽行输入时失败

xtupzzrd  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(259)

使用cassandra1.1.6、pig 0.10.0和hadoop 1.1.0,我可以在examples/pig中成功地运行cassandra附带的pig\u cassandra示例脚本。
但当我改变的时候

rows = LOAD 'cassandra://PigTest/SomeApp' USING CassandraStorage();

收件人:

rows = LOAD 'cassandra://PigTest/SomeApp?widerows=true' USING CassandraStorage();

我得到以下错误:

java.lang.IndexOutOfBoundsException: Index: 8, Size: 2
    at java.util.ArrayList.rangeCheck(ArrayList.java:604)
    at java.util.ArrayList.get(ArrayList.java:382)
    at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:156)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:579)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:126)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)                                                                                                                                                                                                                                                                                                           
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

在本地和mapreduce模式下运行时,或者如果我将pig\u widerow\u input设置为true,就会发生这种情况。
以下脚本将失败,并出现“widerows=true”参数。

rows = LOAD 'cassandra://PigTest/SomeApp?widerows=true' USING CassandraStorage();
cols = FOREACH rows GENERATE flatten(columns.name);
DUMP cols;

我似乎无法超越这一点,在使用宽行输入时不读取someapp列族中的静态列。其他柱族也存在同样的问题。

jecbmhm3

jecbmhm31#

我也有类似的问题。这可能是因为在以后的1.1.x版本中修复了get\u paged\u片段中的错误。解决方案是将cassandra升级到1.1.8和1.1.9
请参见:
cassandra-4919:storageproxy.getrangeslice有时返回不正确的列数
Cassandra-4816:断页
cassandra-5098:cassandrastorage在widerow模式下不解码名称

相关问题