SQL Server How to calculate time distribution without subqueries

tkqqtvp1 于 2023-03-28 发布在其他

关注(0)|答案(4)|浏览(127)

I have a table containing list of cars, time they spent driving at a certain speed per year, as well as total time spent driving the same year. What I'm looking for is calculate % of time spent driving in different speeds.

Data looks like this:

create table #stats (
    car varchar(30), yr INT, speed NUMERIC(5,1), hoursSpent INT, hoursInPeriod INT
    PRIMARY KEY(car, yr, speed)
)

INSERT INTO #stats (
    car, yr, speed, hoursSpent, hoursInPeriod
)
VALUES  ('Volvo', 2019, 50, 20, 300)
,   ('Volvo', 2019, 65, 13, 300)
,   ('Volvo', 2019, 70, 30, 300)

,   ('Volvo', 2020, 50, 10, 250)
,   ('Volvo', 2020, 65, 25, 250)
,   ('Volvo', 2020, 70, 40, 250)

,   ('Volvo', 2021, 50, 5, 100)
,   ('Volvo', 2021, 70, 10, 100)

,   ('Tesla', 2019, 50, 5, 100)
,   ('Tesla', 2019, 65, 20, 100)
,   ('Tesla', 2019, 70, 10, 100)

,   ('Tesla', 2020, 50, 10, 100)
,   ('Tesla', 2020, 65, 20, 100)
,   ('Tesla', 2020, 70, 13, 100)

,   ('Tesla', 2021, 50, 30, 100)
,   ('Tesla', 2021, 65, 20, 100)
,   ('Tesla', 2021, 70, 50, 100)

The important point to consider is that SUM of hoursSpent will never be equal to hoursInPeriod, one can think of hoursSpent as time spent moving, and hoursInPeriod as total time car "existed".

It's pretty easy to calculate the time-% by doing following query:

SELECT  speed
, SUM(hoursSpent) * 1.0 / (SELECT SUM(hoursInPeriod) FROM (select distinct car, yr, hoursInPeriod FROM #stats s) x)
FROM    #stats ss
GROUP BY speed

Expected result:

speed	distribution
50	0.084210526315
65	0.103157894736
70	0.161052631578

The problem is that I can't use the above query.

I'm accessing this data through very rudimentary API which only allows to specify: select columns (any complex expression or function calls or window functions are ok as long as they don't "SELECT FROM" some table), table name, where condition, group by and order by. It doesn't allow derived tables, joins or correlated subqueries that selects from other tables.

My question is, is it possible to calculate the distributed time with only the above constructs. I'm either missing something very simple, or maybe it's very hard. Optionally, I want to be able to GROUP BY car / year as well.

A query I considered:

SELECT speed
    , SUM(hoursSpent) / (SELECT SUM(hoursInPeriod) * 1.0 / COUNT(*))
FROM #stats ss
GROUP BY speed

It looks pretty good, unfortunately, it doesn't return correct result, because the COUNT isn't evenly distributed.

sql-server

来源：https://stackoverflow.com/questions/75817464/how-to-calculate-time-distribution-without-subqueries

4条答案

按热度按时间

juud5qan1#

You can use window functions with aggregation. Starting from your existing query:

SELECT speed,
    1.0 * SUM(hoursSpent)
        / NULLIF(SUM(SUM(hoursInPeriod)) OVER(), 0) as ratio
FROM #stats ss
GROUP BY speed

The window sum computes the grand total of hours spent, which we can use as a denominator to compute the ratio (with a safety against division-by-zero errors).

If you have repeated hoursInPeriod on each year, then we probably need another level of aggregation:

SELECT speed, 
    1.0 * SUM(hoursSpent)
        / NULLIF(SUM(SUM(hoursInPeriod)) OVER(), 0) as ratio
FROM (
    SELECT speed, hoursInPeriod, SUM(hoursSpent) hoursSpent
    FROM #stats
    GROUP BY speed, hoursInPeriod
) s
GROUP BY speed

赞(0）回复(0）举报 2023-03-28

zlhcx6iw2#

Welcome to the wonderful world of windowed functions!

Give this a try:

SELECT DISTINCT Car, Speed, yr, SUM(.0+hoursSpent) OVER (PARTITION BY Car, yr, Speed ORDER BY Speed) AS HoursAtSpeed, MAX(.0+hoursInPeriod) OVER (PARTITION BY Car, Yr ORDER BY (SELECT 1)) AS TotalHours, 
SUM(.0+hoursSpent) OVER (PARTITION BY Car, yr, Speed ORDER BY Speed)/MAX(.0+hoursInPeriod) OVER (PARTITION BY yr ORDER BY (SELECT 1))
  FROM #stats
 ORDER BY Yr, Car, Speed

Car	Speed	yr	HoursAtSpeed	TotalHours	PercentHours
Tesla	50.0	2019	5.0	100.0	0.016666
Tesla	65.0	2019	20.0	100.0	0.066666
Tesla	70.0	2019	10.0	100.0	0.033333
Volvo	50.0	2019	20.0	300.0	0.066666
Volvo	65.0	2019	13.0	300.0	0.043333
Volvo	70.0	2019	30.0	300.0	0.100000
Tesla	50.0	2020	10.0	100.0	0.040000
Tesla	65.0	2020	20.0	100.0	0.080000
Tesla	70.0	2020	13.0	100.0	0.052000
Volvo	50.0	2020	10.0	250.0	0.040000
Volvo	65.0	2020	25.0	250.0	0.100000
Volvo	70.0	2020	40.0	250.0	0.160000
Tesla	50.0	2021	30.0	100.0	0.300000
Tesla	65.0	2021	20.0	100.0	0.200000
Tesla	70.0	2021	50.0	100.0	0.500000
Volvo	50.0	2021	5.0	100.0	0.050000
Volvo	70.0	2021	10.0	100.0	0.100000

You can adjust how the values are group by changing the columns included in the PARTITIION BY for each of the windowed functions. I have it broken down by car, speed and year in this example.

赞(0）回复(0）举报 2023-03-28

hfyxw5xn3#

If you want to get the percentage for each row, per car per year, you just need to remove the aggregation and add a PARTITION BY with a window function:

SELECT
  s.car,
  s.yr,
  s.speed,
  s.hoursSpent * 1.0 / SUM(s.hoursSpent) OVER (PARTITION BY s.car, s.yr) 
FROM #stats s;

db<>fiddle

赞(0）回复(0）举报 2023-03-28

bhmjp9jg4#

Can you make two separate calls to the API?

Below is a pseudo sample based on Python to combine the results of 2 calls without subqueries (nb "api_call" is a placeholder for how you call your api):

Get the needed grand total of hoursInPeriod into a variable:

total_hours_in_period_query = "SELECT SUM(hoursInPeriod) FROM (SELECT DISTINCT car, yr, hoursInPeriod FROM #stats) AS subq"
total_hours_in_period = api_call(total_hours_in_period_query)

Then make a second call to get the sum of hours spent for each speed and divided by the result of the first call:

hours_spent_query = f"SELECT speed, SUM(hoursSpent) / {total_hours_in_period} FROM #stats GROUP BY speed"
results = api_call(hours_spent_query)

赞(0）回复(0）举报 2023-03-28

我来回答

SQL Server How to calculate time distribution without subqueries

4条答案

相关问题

热门标签

最新问答