我们有4个表,以研究广告系列的电子邮件性能:
- 表A包含活动名称信息
- 表B包含活动ID交付性能(活动发送对象)
- 表C包含活动ID打开性能(谁打开了电子邮件)
- 表D包含了广告系列的点击表现(谁点击了电子邮件中的任何元素)
表结构的示例(注意:假设这是快照而不是完整的数据集):
-- Table A: Campaign Information
CREATE TABLE TableA (
campaign_name VARCHAR(255),
campaign_id INT
);
-- Table B: Email Delivery Information
CREATE TABLE TableB (
campaign_id INT,
delivery_date DATE,
user_id VARCHAR(50)
);
-- Table C: Email Open Information
CREATE TABLE TableC (
campaign_id INT,
open_date DATE,
user_id VARCHAR(50)
);
-- Table D: Email Click Information
CREATE TABLE TableD (
campaign_id INT,
click_date DATE,
user_id VARCHAR(50)
);
-- Inserts for TableA (Campaign Information)
INSERT INTO TableA (campaign_name, campaign_id) VALUES
('Instacash Promo 1', 112233),
('Instacash Promo 2', 112244),
('RoarMoney Balance 5', 112259);
-- Inserts for TableB (Email Delivery Information)
INSERT INTO TableB (campaign_id, delivery_date, user_id) VALUES
(112233, '2021-01-01', 'a'),
(112233, '2021-01-01', 'b'),
(112233, '2021-01-01', 'c'),
(112244, '2021-01-05', 'd'),
(112244, '2021-01-05', 'e');
-- Inserts for TableC (Email Opened Information)
INSERT INTO TableC (campaign_id, open_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112233, '2021-01-05', 'b'),
(112244, '2021-01-07', 'd'),
(112244, '2021-01-10', 'e');
-- Inserts for TableD (Email Link Clicked Information)
INSERT INTO TableD (campaign_id, click_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112244, '2021-01-11', 'e');
字符串
我需要写一个SQL查询之间的平均时间第二和第三电子邮件为每个单独的产品在整体水平?输出应包括:
| 活动类别|收到第二至第三封电子邮件的平均时间|
| --|--|
| | |
| | |
campaign_name应该被归类为Instacash、RoarMoney的campaign_category,或者任何不在Instacash或RoarMoney下的活动都可以被称为“其他”。
这是我的代码,但我认为下面的代码是错误的:
-- Calculate DATEDIFF between rn = 3 and rn = 2
WITH temp as (
SELECT
CASE
WHEN campaign_name REGEXP '^Instacash' THEN 'Instacash'
WHEN campaign_name REGEXP '^RoarMoney' THEN 'RoarMoney'
ELSE 'Others'
END AS campaign_category,
a.campaign_id,
a.campaign_name,
b.delivery_date
FROM TableA a
LEFT JOIN TableB b ON a.campaign_id = b.campaign_id
),
temp_1 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY campaign_name ORDER BY delivery_date) as rn
FROM temp
),
temp_2 as (
SELECT *
FROM temp_1
WHERE rn in (2, 3)
),
temp_3 as (
SELECT
t1.campaign_category,
t1.campaign_id,
t1.campaign_name,
DATEDIFF(day, t2.delivery_date, t3.delivery_date) as date_difference
FROM temp_2 t1
JOIN temp_2 t2 ON t1.campaign_id = t2.campaign_id AND t2.rn = 2
JOIN temp_2 t3 ON t1.campaign_id = t3.campaign_id AND t3.rn = 3
)
-- Select the calculated result
SELECT * FROM temp_3;
型
有人能帮我吗?
2条答案
按热度按时间inkz8wg91#
您的REGEXP for campaign_name似乎有点过多,可以用LIKE代替。如果您想要每个广告系列类别的第2和第3封电子邮件之间的平均天数,那么您可以使用这样的东西:
字符串
注意事项:我在CTE中将分区更改为
campaign_id
而不是campaign_name
,并将GROUP BY campaign_category
添加到最终选择中以获得每个campaign_category
的平均值。这里是db<>fiddle。
lg40wkob2#
字符串
我认为这段代码更接近问题的要求。谢谢@user1191247。