I need to join 6 tables by item_name, (The DB of each table is lower than 10k items except 1 table with 30k,table_1 is the samllest with 5k items)
I join tables by same id or name where them both are b-tree indeces, the join by name based on the pg_trgm similar enough.
SELECT * FROM 'TABLE_1' ta,
JOIN 'TABLE_2' tb ON tb.item_code = ta.item_code or tb.item_name % ta.item_name,
.
.
.
JOIN 'TABLE_6' tf ON tf.item_code = tf.item_name or tf.item_name 5 ta.item_name
In this example i'm using "%" for similar enogth, if instead similar enough i will use "=" operator it will take 36 ms only.
i have many misunderstandings with this example.my mention is to get the 1 result pet item if item codes are equal or 3-4 results if similar enogh (dont know how to limit it in my exapmle)
- why is it so slow ? (50% of the result the item_code comparison return true so it shuldnt use similar enogth for all the data... in other words if the left condition of or statment are true it doesnt check the right condition)
- how can i limit the result cames from similar enogth ? lets say
tableA : item_name = 'laptop'
tableB : item_name = 'laptop a"
tableB : item_name = 'laptop b"
tableB : item_name = 'laptop c"
id | item_name_a | item_name_b |
---|---|---|
1 | laptop | laptop a |
1 | laptop | laptop b |
1 | laptop | laptop c |
it will join them all and if i want to limit it to 2 result, how can i do it?
1条答案
按热度按时间gwo2fgha1#
Not an answer
It's some questions to your question: Based on your query, assume there're only two tables
t1
andt2
:Table
t1
:Table
t2
:By apply your query logic:
We get:
You can see that even
t1.id
=t2.id
, it doesn't stop the comparisont2.nm like t1.nm || '%'
. Thus, theinner join
in the query is actually a nested loop as:Therefore, the
join
is likecross join
(Cartesian product of rows from the tables in the join. ie. 5k x 10k = 50m) during execution because it tries to find every match betweent1.nm
andt2.nm
regardless ift1.id=t2.id
or not.If the business rule is: when
t1.id=t2.id
, then remove the row from both tables and do the comparison for the rest. You may want to modify your query into a few steps:Step 1. Do
item_code
join only and getmatched_items
Step 2. Derive
unmatched_items_t[1..6]
by subtractmatched_items
fromtable_[1..6]
Step 3. Do string match comparison between
unmatched_items_t1
andunmatched_items_t[2..6]
: (t1 vs t2), (t1 vs t3), (t1 vs t4), (t1 vs t5), (t1 vs t6)Step 4. Aggregate the 5 comparison results to
t1.item_code
level.UPDATE: Could be an answer
Assume there are three tables
t1
,t2
,t3
as below:We want to do
code
match and thenname
match:cte_universe
: this is the universe of item's code and name from t1's perspectivecte_t1_cd_matched
: code-matched data by comparing t1, t2, t3cte_t1_cd_unmatched
: code not matched in t1 from the abovecte_[t2|t3]_cd_unmatched
: code not matched in t2, t3cte_nm_matched_flat
: name-matched by name by table (src)cte_nm_matched
: aggregate the above to code levelOutput: