如何计算配置单元中的cte(公共表表达式)

smtd7mpg  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(288)

我的问题是关于性能和cte在运行时得到评估的方式。
我计划通过定义一个基本投影来重用代码,然后用不同的过滤器在这个基本投影之上定义多个cte。
这是否会导致任何性能问题。更具体地说,是否每次都会对基础投影进行评估。
例如:

WITH CTE_PERSON as (
   SELECT * FROM PersonTable
),

CTE_PERSON_WITH_AGE as (
   SELECT * FROM CTE_PERSON WHERE age > 24 
),

CTE_PERSON_WITH_AGE_AND_GENDER as (
  SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm'
),

CTE_PERSON_WITH_NAME as (
  SELECT * FROM CTE_PERSON WHERE name = 'abc'
)

是否每次persontable中的所有条目都会加载到内存中,然后应用过滤器(或)
只有筛选后的结果集才会加载到内存中。

pinkon5k

pinkon5k1#

一次扫描。
注:
-单级
-单身汉 TableScan - predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean) ```
create table t (i int,j int,k int);

explain
with t1 as (select i,j,k from t where i=1)
,t2 as (select i,j,k from t1 where j=2)
,t3 as (select i,j,k from t2 where k=3)

select * from t3
;

Explain
STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: t
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: 1 (type: int), 2 (type: int), 3 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink

相关问题