I have a table containing list of cars, time they spent driving at a certain speed per year, as well as total time spent driving the same year. What I'm looking for is calculate % of time spent driving in different speeds.
Data looks like this:
create table #stats (
car varchar(30), yr INT, speed NUMERIC(5,1), hoursSpent INT, hoursInPeriod INT
PRIMARY KEY(car, yr, speed)
)
INSERT INTO #stats (
car, yr, speed, hoursSpent, hoursInPeriod
)
VALUES ('Volvo', 2019, 50, 20, 300)
, ('Volvo', 2019, 65, 13, 300)
, ('Volvo', 2019, 70, 30, 300)
, ('Volvo', 2020, 50, 10, 250)
, ('Volvo', 2020, 65, 25, 250)
, ('Volvo', 2020, 70, 40, 250)
, ('Volvo', 2021, 50, 5, 100)
, ('Volvo', 2021, 70, 10, 100)
, ('Tesla', 2019, 50, 5, 100)
, ('Tesla', 2019, 65, 20, 100)
, ('Tesla', 2019, 70, 10, 100)
, ('Tesla', 2020, 50, 10, 100)
, ('Tesla', 2020, 65, 20, 100)
, ('Tesla', 2020, 70, 13, 100)
, ('Tesla', 2021, 50, 30, 100)
, ('Tesla', 2021, 65, 20, 100)
, ('Tesla', 2021, 70, 50, 100)
The important point to consider is that SUM of hoursSpent will never be equal to hoursInPeriod, one can think of hoursSpent as time spent moving, and hoursInPeriod as total time car "existed".
It's pretty easy to calculate the time-% by doing following query:
SELECT speed
, SUM(hoursSpent) * 1.0 / (SELECT SUM(hoursInPeriod) FROM (select distinct car, yr, hoursInPeriod FROM #stats s) x)
FROM #stats ss
GROUP BY speed
Expected result:
speed | distribution |
---|---|
50 | 0.084210526315 |
65 | 0.103157894736 |
70 | 0.161052631578 |
The problem is that I can't use the above query.
I'm accessing this data through very rudimentary API which only allows to specify: select columns (any complex expression or function calls or window functions are ok as long as they don't "SELECT FROM" some table), table name, where condition, group by and order by. It doesn't allow derived tables, joins or correlated subqueries that selects from other tables.
My question is, is it possible to calculate the distributed time with only the above constructs. I'm either missing something very simple, or maybe it's very hard. Optionally, I want to be able to GROUP BY car / year as well.
A query I considered:
SELECT speed
, SUM(hoursSpent) / (SELECT SUM(hoursInPeriod) * 1.0 / COUNT(*))
FROM #stats ss
GROUP BY speed
It looks pretty good, unfortunately, it doesn't return correct result, because the COUNT isn't evenly distributed.
4条答案
按热度按时间juud5qan1#
You can use window functions with aggregation. Starting from your existing query:
The window sum computes the grand total of hours spent, which we can use as a denominator to compute the ratio (with a safety against division-by-zero errors).
If you have repeated
hoursInPeriod
on each year, then we probably need another level of aggregation:zlhcx6iw2#
Welcome to the wonderful world of windowed functions!
Give this a try:
You can adjust how the values are group by changing the columns included in the PARTITIION BY for each of the windowed functions. I have it broken down by car, speed and year in this example.
hfyxw5xn3#
If you want to get the percentage for each row, per car per year, you just need to remove the aggregation and add a
PARTITION BY
with a window function:db<>fiddle
bhmjp9jg4#
Can you make two separate calls to the API?
Below is a pseudo sample based on Python to combine the results of 2 calls without subqueries (nb "api_call" is a placeholder for how you call your api):
Get the needed grand total of hoursInPeriod into a variable:
Then make a second call to get the sum of hours spent for each speed and divided by the result of the first call: