我有下面的Cassandra表:
CREATE TABLE segments (
b text,
s int,
c int,
PRIMARY KEY (b)
)
以及以下关系:
data: {b: chararray,s: long,c: long}
我正在从一个存储在pigstorage中的文件加载它
data = LOAD 'some_file' as (b:chararray,s:long,c:long);
我试图将pig关系存储到cassandra表中,但没有成功。我试过:
to_cassandra = FOREACH (GROUP data ALL)
GENERATE
TOTUPLE(TOTUPLE('b',data.b)),
TOTUPLE('s',data.s),
TOTUPLE('c',data.c);
STORE to_cassandra INTO
'cql://pv/segments?
output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
USING CqlStorage();
其中解码输出查询是:
UPDATE pv.segments SET s=?,c=?
但我得到了以下信息:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats -
ERROR: java.lang.ClassCastException:
org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray
有点神秘。哪一个是违规领域?我该怎么解决这个问题?
编辑
我跑了 illustrate to_cassandra;
得到:
-----------------------------------------------------------------------------------------------------
| data | b:chararray | s:long | c:long |
-----------------------------------------------------------------------------------------------------
| | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1 | 1 |
| | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1 | 1 |
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3 | group:chararray | data:bag{:tuple(b:chararray,s:long,c:long)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | all | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)})) | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)}) | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)}) |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)})) | (s, {(1), (1)}) | (c, {(1), (1)}) |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1条答案
按热度按时间gmxoilav1#
您的分组有问题,因为它为每个字段生成数组,而不是单个值,这正是cassandra所期望的。您的输出最终应该如下所示:
... 为了匹配你的模式。由于输出模式与输入直接匹配,因此分组的目的并不明确。