Zookeeper Clickhouse实体化视图聚合幻像行

643ylb08  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(132)

我使用的是clickhouse,这是我当前的表架构。
我有一个包含我的数据的主表:

CREATE TABLE default.Liquidity
(
    `Date` Date,
    `LiquidityId` UInt64,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `IsIn` String,
    `Currency` String,
    `Scenario` String,
    `Price` String,
    `Leg` Int8,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` UInt64,
    `stream_id` Int64
)
ENGINE = Distributed('{cluster}', '', 'shard_Liquidity', TreeId_LQ)

而且我还有一个物化视图,它将聚合数据存储在其他表中

CREATE MATERIALIZED VIEW default.mv_Liquidity_facet TO default.shard_state_Liquidity_facet
(
    `Date` Date,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `Currency` String,
    `Scenario` String,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` AggregateFunction(sum, UInt64)
) AS
SELECT
    Date,
    TreeId_LQ,
    AggregateId,
    ClientId,
    InstrumentId,
    Currency,
    Scenario,
    commit,
    factor,
    sumState(nb_aggregated) AS nb_aggregated
FROM default.shard_Liquidity
GROUP BY
    Date,
    TreeId_LQ,
    AggregateId,
    ClientId,
    InstrumentId,
    Currency,
    Scenario,
    commit,
    factor

----------------

CREATE TABLE default.shard_state_Liquidity_facet
(
    `Date` Date,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `Currency` String,
    `Scenario` String,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` AggregateFunction(sum, UInt64)
)
ENGINE = ReplicatedAggregatingMergeTree('{zoo_prefix}/tables/{shard}/shard_state_Liquidity_facet', '{host}')
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
SETTINGS index_granularity = 8192

正如您可能已经猜到的,nb_aggregated列表示为获得此结果而聚合的行数。
如果我使用许多筛选器对分布式查询进行查询,以便查找一行

select
       sum(nb_aggregated)               AS nb_aggregated
from Liquidity
where Date = '2022-10-17'
  and TreeId_LQ = 1129
  and AggregateId = 999999999999
  and ClientId = 1
  and InstrumentId = 593
  and Currency = 'AUD'
  and Scenario = 'BAU'
  and commit = -2695401333399944382
  and factor = 1;

--- Result
1

我最终只得到一行,因此,如果我使用相同的过滤器进行相同的查询,但使用物化视图创建的表的聚合版本,我最终也应该只得到一行,并得到nb_aggregated = 1,但我最终得到nb_aggregated = 2,就好像他将我的行与另一行聚合在一起一样,其他大多数值也是错误的。
我知道我的例子很难理解,但如果你有任何线索,这将是很好的。

dldeef67

dldeef671#

Well, I have ask the same question one the clickhouse repository on github and Denny Crane give me this answer that is working for me here: https://github.com/ClickHouse/ClickHouse/issues/43988#issuecomment-1339731917

  • 在大多数情况下,MatView group by应与存储表匹配 * ORDER BY
CREATE MATERIALIZED VIEW default.mv_Liquidity_facet:
GROUP BY Date, TreeId_LQ, AggregateId, ClientId, InstrumentId, Currency, Scenario, commit, factor

CREATE TABLE default.shard_state_Liquidity_facet
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
Your ReplicatedAggregatingMergeTree "CORRUPTS" Currency / factor columns using ANY function
  • 解决方法是 *
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario, Currency  , factor)

https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

相关问题