cassandra+pig+cql+counter列出错

wgmfuz8q  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(266)

我用pig访问了Cassandra的一个列族和counter column。当我尝试转储数据时,出现以下错误:

cqlsh:pollkan> CREATE TABLE votes_count_period_1 (
           ...   period int,
           ...   poll text,
           ...   votes counter,
           ...   PRIMARY KEY (period, poll)
           ... );

cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';

cqlsh:pollkan> select * from votes_count_period_1;

 period   | poll                                 | votes
----------+--------------------------------------+-------
 20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |     5
 20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |     2
 20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |     3

root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_1' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;

Causes:

2013-08-31 23:01:35,397 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed ColumnFamilySplit((-69569900416187863, '-54603788994328078] @[cassandra001, cassandra002, cassandra003])
2013-08-31 23:01:35,417 [pool-4-thread-1] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:01:35,418 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[2,4] C:  R:
2013-08-31 23:01:35,424 [Thread-10] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:01:35,428 [Thread-10] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local712790083_0002
java.lang.Exception: java.lang.IndexOutOfBoundsException
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:538)
        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
        at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25)
        at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137)
        at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

我读到了https://issues.apache.org/jira/browse/cassandra-5234 已解决cql3表和计数器列的问题,但我仍然有问题。
顺便说一句,我尝试用旧式紧凑型存储重新创建表,我又前进了一点,但遇到了一个新问题,出现了以下错误:

cqlsh:pollkan> CREATE TABLE votes_count_period_2 (
           ...   period int,
           ...   poll text,
           ...   votes counter,
           ...   PRIMARY KEY (period, poll)
           ... ) WITH COMPACT STORAGE;
cqlsh:pollkan>
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan>
cqlsh:pollkan> select * from votes_count_period_2;

 period   | poll                                 | votes
----------+--------------------------------------+-------
 20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |     5
 20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |     2
 20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |     3

root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
2013-08-31 23:05:59,454 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-08-31 23:05:59,458 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:05:59,465 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-08-31 23:05:59,466 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
((period,20130830),(poll,605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a),(votes,5))
((period,20130831),(poll,405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a),(votes,2))
((period,20130831),(poll,505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a),(votes,3))

grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> B = FOREACH A GENERATE poll, votes;
grunt> describe B;
B: {poll: chararray,votes: long}
grunt> C = GROUP B BY poll;
grunt> describe C;
C: {group: chararray,B: {(poll: chararray,votes: long)}}
grunt> D = FOREACH C GENERATE group AS pollgroup, SUM(B.votes);
grunt> describe D;
D: {pollgroup: chararray,long}
grunt> dump D;

2013-08-31 23:53:32,577 [pool-33-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[13,4],B[14,4],D[18,4],C[17,4] C: D[18,4],C[17,4] R: D[18,4]
2013-08-31 23:53:32,586 [pool-33-thread-1] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-08-31 23:53:32,589 [Thread-65] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:53:32,591 [Thread-65] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local814297309_0018
java.lang.Exception: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
        at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:76)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

我的版本是pig0.11.1和cassandra1.2.9。
有什么帮助吗?
谢谢

ymzxtsji

ymzxtsji1#

今天早些时候,我在使用类似的数据结构测试最新的pig cql3集成时发现了同样的问题。
你提到的吉拉问题,https://issues.apache.org/jira/browse/cassandra-5234,不包含已验证可用于其中一个评论者的修补程序。不过,快速浏览一下cassandragit就会发现,它既没有应用在1.2分支上,也没有应用在 Backbone.js 上。我已经在jira问题上添加了一条评论。
在提交修补程序并发布新的稳定版本之前,解决方案是在1.2.9的新 checkout 上应用修补程序,重新编译并部署到hadoop节点(如果您可以选择的话)。

相关问题