postgresql 带条件和限制postgres连接表

np8igboo  于 2022-12-12  发布在  PostgreSQL
关注(0)|答案(1)|浏览(199)

I need to join 6 tables by item_name, (The DB of each table is lower than 10k items except 1 table with 30k,table_1 is the samllest with 5k items)
I join tables by same id or name where them both are b-tree indeces, the join by name based on the pg_trgm similar enough.

SELECT * FROM 'TABLE_1' ta,
JOIN 'TABLE_2' tb ON tb.item_code = ta.item_code or tb.item_name % ta.item_name,
.
.
.
JOIN 'TABLE_6' tf ON tf.item_code = tf.item_name or tf.item_name 5 ta.item_name

In this example i'm using "%" for similar enogth, if instead similar enough i will use "=" operator it will take 36 ms only.
i have many misunderstandings with this example.my mention is to get the 1 result pet item if item codes are equal or 3-4 results if similar enogh (dont know how to limit it in my exapmle)

  1. why is it so slow ? (50% of the result the item_code comparison return true so it shuldnt use similar enogth for all the data... in other words if the left condition of or statment are true it doesnt check the right condition)
  2. how can i limit the result cames from similar enogth ? lets say
tableA : item_name = 'laptop'
tableB : item_name = 'laptop a"
tableB : item_name = 'laptop b"
tableB : item_name = 'laptop c"
iditem_name_aitem_name_b
1laptoplaptop a
1laptoplaptop b
1laptoplaptop c

it will join them all and if i want to limit it to 2 result, how can i do it?

gwo2fgha

gwo2fgha1#

Not an answer

It's some questions to your question: Based on your query, assume there're only two tables t1 and t2 :
Table t1 :

id|nm|
--+--+
 1|a |
 2|b |
 3|c |

Table t2 :

id|nm|
--+--+
 1|a |
11|ab|
12|ac|
 2|b |
21|bb|
22|bc|
 3|c |
31|cb|
32|cc|
 4|d |
41|db|
42|dd|

By apply your query logic:

select t1.id  as t1_id,
       t2.id  as t2_id,
       t1.nm  as t1_nm,
       t2.nm  as t2_nm
  from t1
  join t2
    on t1.id = t2.id
    or t2.nm like t1.nm || '%';

We get:

t1_id|t2_id|t1_nm|t2_nm|
-----+-----+-----+-----+
    1|    1|a    |a    |
    1|   11|a    |ab   |
    1|   12|a    |ac   |
    2|    2|b    |b    |
    2|   21|b    |bb   |
    2|   22|b    |bc   |
    3|    3|c    |c    |
    3|   31|c    |cb   |
    3|   32|c    |cc   |

You can see that even t1.id = t2.id , it doesn't stop the comparison t2.nm like t1.nm || '%' . Thus, the inner join in the query is actually a nested loop as:

for each row r1 in t1
   for each row r2 in t2
      compare r1 and r2
      if match then return (r1, r2)
               else next

Therefore, the join is like cross join (Cartesian product of rows from the tables in the join. ie. 5k x 10k = 50m) during execution because it tries to find every match between t1.nm and t2.nm regardless if t1.id=t2.id or not.
If the business rule is: when t1.id=t2.id , then remove the row from both tables and do the comparison for the rest. You may want to modify your query into a few steps:
Step 1. Do item_code join only and get matched_items
Step 2. Derive unmatched_items_t[1..6] by subtract matched_items from table_[1..6]
Step 3. Do string match comparison between unmatched_items_t1 and unmatched_items_t[2..6] : (t1 vs t2), (t1 vs t3), (t1 vs t4), (t1 vs t5), (t1 vs t6)
Step 4. Aggregate the 5 comparison results to t1.item_code level.

UPDATE: Could be an answer

Assume there are three tables t1 , t2 , t3 as below:

create table t1 (cd int primary key, nm varchar);
create table t2 (cd int primary key, nm varchar);
create table t3 (cd int primary key, nm varchar);

insert into t1
values
(1, 'a'),
(2, 'b'),
(3, 'c'),
(4, 'd'),
(5, 'e');

insert into t2
values
( 2, 'b'),
(21, 'a 2'),
(22, 'd 2'),
(23, 'x');

insert into t3
values
( 3, 'c'),
(31, 'a 3'),
(32, 'e 3'),
(33, 'y');

We want to do code match and then name match:

  • cte_universe : this is the universe of item's code and name from t1's perspective
  • cte_t1_cd_matched : code-matched data by comparing t1, t2, t3
  • cte_t1_cd_unmatched : code not matched in t1 from the above
  • cte_[t2|t3]_cd_unmatched : code not matched in t2, t3
  • cte_nm_matched_flat : name-matched by name by table (src)
  • cte_nm_matched : aggregate the above to code level
  • output code-matched data union all name-matched data
with cte_item_universe as (
select t1.cd,
       t1.nm,
       case when t2.cd is null then 0 else 1 end is_t2_cd_matched,
       case when t3.cd is null then 0 else 1 end is_t3_cd_matched
  from t1 
  left
  join t2 using (cd)
  left
  join t3 using (cd)),
cte_t1_cd_matched as (
select cd, nm from cte_item_universe where is_t2_cd_matched + is_t3_cd_matched > 0),
cte_t1_cd_unmatched as (
select cd, nm from cte_item_universe where is_t2_cd_matched + is_t3_cd_matched = 0),
cte_t2_cd_unmatched as (
select t.cd, t.nm from t2 t left join cte_t1_cd_matched m using (cd) where m.cd is null),
cte_t3_cd_unmatched as (
select t.cd, t.nm from t3 t left join cte_t1_cd_matched m using (cd) where m.cd is null),
cte_nm_matched_flat as (
select s.cd, t.nm, 't2' as src
  from cte_t1_cd_unmatched s, cte_t2_cd_unmatched t
 where t.nm like s.nm || '%'
union all
select s.cd, t.nm, 't3' as src
  from cte_t1_cd_unmatched s, cte_t3_cd_unmatched t
 where t.nm like s.nm || '%'),
cte_nm_matched as (
select cd, string_agg(nm || ' (' || src || ') ', ',') as nm
  from cte_nm_matched_flat
 group by cd)
select *, 'code' as reason from cte_t1_cd_matched
union all
select *, 'name' as reason from cte_nm_matched;

Output:

cd|nm                 |reason|
--+-------------------+------+
 2|b                  |code  |
 3|c                  |code  |
 1|a 2 (t2) ,a 3 (t3) |name  |
 4|d 2 (t2)           |name  |
 5|e 3 (t3)           |name  |

相关问题