优化输入Tableau的MySQL查询的更好方法?

a1o7rhls  于 2022-10-22  发布在  Mysql
关注(0)|答案(2)|浏览(184)

我使用tableau作为 Jmeter 板,从MySQL数据库中提取数据。它工作得很好,但查询执行时间太长。数据仅限于前2年和从今天起最近的一个星期六。

**背景信息:**我在每个账户(客户)的单个表中有一个每日节奏数据的销售数据。有一个名为distributor_view的字段,它可以有以下两个值之一:制造采购。一个项目有时会从制造视图中消失,但总是在采购中。两者之间的收入数字并不相等,但为了数据的完整性,如果制造视图中缺少任何项目(首选),则从采购中提取收入。

注意:如果查询了所有数据,大部分数据将在itemId级别上重复-只有大约5%的项目在制造视图中丢失,需要由采购视图引用。
下面的查询逻辑是这样的,它将拉取制造视图中的所有项目,并合并采购视图中存在且制造视图中缺失的所有项目。

查询:

SELECT * FROM 
(
    SELECT 'Client1' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table1 
    WHERE 
        distributor_view = 'Manufacturing' 
        AND 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
    UNION ALL
    SELECT 'Client1' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table1 
    WHERE 
        distributor_view = 'Sourcing' 
        AND 
        (date, item_id) NOT IN (
            SELECT date, item_id FROM table1 WHERE distributor_view = 'Manufacturing' AND YEAR(date) >= YEAR(CURDATE()) - 2  AND date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
        ) 
        AND 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
    UNION ALL
    SELECT 'Client2' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table2
    WHERE 
        distributor_view = 'Manufacturing' 
        AND 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
    UNION ALL
    SELECT 'Client2' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table2
    WHERE 
        distributor_view = 'Sourcing' 
        AND 
        (date, item_id) NOT IN (
            SELECT date, item_id FROM table2 WHERE distributor_view = 'Manufacturing' AND YEAR(date) >= YEAR(CURDATE()) - 2  AND date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
        ) 
        AND 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
) AS sales 
ORDER BY date DESC

在之前的迭代中,我使用Python创建了一个脚本,该脚本将删除制造业中存在的采购项目。这样,查询就会简单得多——一个与distributor_view无关的选择。我不喜欢这种方法,因为它要求每次数据更改(每天)都运行一个脚本。
我正在集思广益的另一个选择是,只提取所有数据,并尝试使用一些计算在表格中协调两者。

but5z9lq

but5z9lq1#

我只用Tableau创建了一个解决方案。我导入了所有数据,但没有嵌套UNION,也没有检查distributor_view

SELECT 'Client1' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table1 
    WHERE 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)
UNION ALL
SELECT 'Client2' as Client, date as 'Date', distributor_view as 'Distributor View', item_id as 'ItemID', product_title as 'Title', ordered_revenue as 'Ordered Revenue'
    FROM table2
    WHERE 
        YEAR(date) >= YEAR(CURDATE()) - 2 
        AND 
        date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)

在表格中,我创建了一个名为sourceing_filter的计算字段:

IF {FIXED [itemID], [Date]: MIN([Distributor View])} = 'Manufacturing' AND [Distributor View] = 'Sourcing' THEN
    0
ELSE
    1
END

我将此calc字段转换为维度,然后将其添加到过滤器中,作为1(或排除0)。这实现了相同的最终结果,查询执行时间从约70m缩短到约4m。

2eafrhcq

2eafrhcq2#

(1) 这不是sargable

AND  YEAR(date) >= YEAR(CURDATE()) - 2

取而代之的是

AND  date >= CONCAT(YEAR(CURDATE(), '-01-01')

(这很糟糕,但应该允许使用涉及date列的索引。)
(2) 我不确定这个语法

AND  date <= DATE_ADD(CURDATE(), INTERVAL(-DAYOFWEEK(CURDATE())) DAY)

让我们换成

AND  date <= CURDATE - INTERVAL DAYOFWEEK(CURDATE()) DAY

(3) 将其添加到两个表中:

INDEX(distributor_view, date, item_id)

(4) 这里有两个问题,但都不能完全优化:

AND  (date, item_id) NOT IN ( SELECT ... )

首先,“不在”。让我们看看EXPLAIN,看看Optimizer是否对它做了什么。
其次,应该避免使用“行构造函数”——例如(date, item_id)。尽管这样“有效”,但不太可能有效。最近的版本有一些改进。您使用的是什么版本的MySQL或MariaDB?
更改为以下之一:

AND  NOT EXISTS ( SELECT ... )

LEFT JOIN ( SELECT ... ) ... WHERE ... IS NULL

(5) 去掉外层SELECT *

( SELECT ... )
UNION ALL
( SELECT ... )
UNION ALL
( SELECT ... )
UNION ALL
( SELECT ... )
ORDER BY date DESC

如果没有额外的parens,ORDER BY将[我认为]只适用于最后一个SELECT。通常,当使用UNION时,额外的parans是安全的,并且可能是必要的。
(6) 如果这些建议不能应用于生成查询的框架,我向您表示慰问。

相关问题