我有如下数据:
select * from test;
department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 1 | hash1
dep1 | Bart | 1 | hash2
dep1 | Lisa | 3 | hash3
dep1 | Lisa | 4 | hash4
dep1 | John | 5 | hash5
dep1 | Lucy | 6 | hash6
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
我想用一个 where
条款 batch_number
,安 ordering
在 batch_number
和一个 in
上的 predicate employee
.
在关系数据库中
select * from test
where department='dep1'
and employee in ('Bart','Lucy','John')
and batch_number >= 2
order by batch_number desc
limit 3;
department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
dep1 | Lucy | 6 | hash6
我在cassandra中为这个查询建模一个表时遇到了一些问题。 department
将是我的分区密钥 hash
必须是主键的一部分。但我正在努力使用集群键和/或(sstable-attached)辅助索引。
因为我要点菜 batch_number
我尝试将其作为群集密钥:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash)
) WITH CLUSTERING ORDER BY (batch_number DESC);
CREATE INDEX tst_emp ON keyspace.test (employee);
但这不允许使用 in
我的索引上的 predicate :
select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="IN predicates on non-primary-key columns (employee) is not yet supported"
所以我试着加入 employee
列作为群集键:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash, employee)
) WITH CLUSTERING ORDER BY (batch_number DESC);
但这失败了,因为我不能把一个非情商关系 batch_number
:
select * from keyspace.test where department='dep1' and batch_number > 1 and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "employee" cannot be restricted (preceding column "batch_number" is restricted by a non-EQ relation)"
但每当我把 employee
之前 batch_number
我失去了点餐的能力 batch_number
:
CREATE TABLE keyspace.test(
department TEXT,
employee TEXT,
batch_number INT,
hash TEXT,
PRIMARY KEY ((department), employee, batch_number, hash)
);
select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John') ORDER BY batch_number DESC;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
那么什么样的表设计会允许这样的查询呢?这能在Cassandra做到吗?
编辑:
我希望能够在此表上运行的其他查询包括:
select * from keyspace.test where department='X' and batch_number=Y
以及
delete from keyspace.test where department='X'
2条答案
按热度按时间q9rjltbz1#
使用允许您重新排列数据的物化视图:
我可以执行以下查询:
结果按聚类顺序排列:
而
>
条款是不平等条款,IN
,虽然有多个值,但仍然是确定性的,这就是为什么我相信您可以毫无问题地筛选键。自batch_number
是您最不希望筛选的内容,任何类型的where子句都是允许的。我想你一直department
.请注意,物化视图会影响性能。更具体地说,写性能。但是,读取性能相对于
ALLOW FILTERING
.更新:
在物化视图末尾指定的顺序是
batch_number
但是,它将首先在department
,那么employee
,然后batch_number
,所以batch_number
具体来说是不保证的。据我所知,这是没有办法的。另一种数据库解决方案可能更可取。更新2:
如apache邮件链中所述(参见下面的注解),物化视图并不被视为生产就绪。然而,datastax认为它们是可用的,前提是它们使用了上述最佳实践。就我个人而言,我对物化视图没有任何问题。当然,这是一个简单的单一数据中心集群,考虑到最佳实践中提到了更复杂的设置,它们在这种情况下可能会崩溃。
w1e3prcc2#
您可以在
employee
如果需要,甚至可以从主键中删除它。你需要停止使用IN
但是您可以拆分查询并在客户端连接结果。这种方法的缺点是索引可能会变得太大。但我不知道数据池的大小,所以我将留给您考虑。