如何检查排序合并bucket连接是否在配置单元中工作?

rqqzpn5f  于 2021-06-04  发布在  Hadoop
关注(0)|答案(2)|浏览(325)

我想验证我的smb连接是否正常工作。我可以通过日志验证Map连接,但不能验证smb。我也查看了解释计划,但没有得到任何提示。请帮帮我。

q5lcpyga

q5lcpyga1#

下面是smbmjoin的提示

set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;

在使用以上提示之后。如果必须选择符合smbm联接条件的表(两个表应按相同的列和相同数量的存储桶进行扣接,并且必须使用扣接的列联接该表)
下面的解释显示了连接查询的o/p

OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: a
            Statistics: Num rows: 13 Data size: 1289 Basic stats: COMPLETE Column stats: NONE
            Sorted Merge Bucket Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {realm} {role} {lid} {mid} {sid} {insert_date}
                1 {realm} {role} {lid} {mid} {sid} {insert_date}
              keys:
                0 mid (type: string)
                1 mid (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col8, _col9, _col10, _col11, _col12, _col13
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: date), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: date)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
                File Output Operator
                  compressed: true
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1

Time taken: 0.131 seconds, Fetched: 35 row(s)

正如您所看到的,o/p清楚地表示它将执行smbmjoin。

o2rvlv0m

o2rvlv0m2#

您可以在查询中使用explain extended。到目前为止,我只能用map reduce生成smb map join。当配置单元执行smbMap连接时,您可以在explain的输出中的stage plans下看到“sorted merge bucket map join operator”。
以下是在我的设置中使用map reduce生成smbMap联接的代码段:

set hive.execution.engine=mr;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;

drop table key_value_large;
drop table key_value_small;

create table key_value_large  (
    key int,
    value string
)
partitioned by (ds string)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

create table key_value_small (
    key int,
    value string
)
partitioned by (ds string)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;

insert into table key_value_large partition(ds='2008-04-08') select key, value from key_value_large_src;
insert into table key_value_small partition(ds='2008-04-08') select key, value from key_value_small_src;

explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;
select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;

希望能帮上忙。

相关问题