SQL Server How to calculate time distribution without subqueries

tkqqtvp1  于 2023-03-28  发布在  其他
关注(0)|答案(4)|浏览(128)

I have a table containing list of cars, time they spent driving at a certain speed per year, as well as total time spent driving the same year. What I'm looking for is calculate % of time spent driving in different speeds.

Data looks like this:

create table #stats (
    car varchar(30), yr INT, speed NUMERIC(5,1), hoursSpent INT, hoursInPeriod INT
    PRIMARY KEY(car, yr, speed)
)

INSERT INTO #stats (
    car, yr, speed, hoursSpent, hoursInPeriod
)
VALUES  ('Volvo', 2019, 50, 20, 300)
,   ('Volvo', 2019, 65, 13, 300)
,   ('Volvo', 2019, 70, 30, 300)

,   ('Volvo', 2020, 50, 10, 250)
,   ('Volvo', 2020, 65, 25, 250)
,   ('Volvo', 2020, 70, 40, 250)

,   ('Volvo', 2021, 50, 5, 100)
,   ('Volvo', 2021, 70, 10, 100)

,   ('Tesla', 2019, 50, 5, 100)
,   ('Tesla', 2019, 65, 20, 100)
,   ('Tesla', 2019, 70, 10, 100)

,   ('Tesla', 2020, 50, 10, 100)
,   ('Tesla', 2020, 65, 20, 100)
,   ('Tesla', 2020, 70, 13, 100)

,   ('Tesla', 2021, 50, 30, 100)
,   ('Tesla', 2021, 65, 20, 100)
,   ('Tesla', 2021, 70, 50, 100)

The important point to consider is that SUM of hoursSpent will never be equal to hoursInPeriod, one can think of hoursSpent as time spent moving, and hoursInPeriod as total time car "existed".

It's pretty easy to calculate the time-% by doing following query:

SELECT  speed
, SUM(hoursSpent) * 1.0 / (SELECT SUM(hoursInPeriod) FROM (select distinct car, yr, hoursInPeriod FROM #stats s) x)
FROM    #stats ss
GROUP BY speed

Expected result:

speeddistribution
500.084210526315
650.103157894736
700.161052631578

The problem is that I can't use the above query.

I'm accessing this data through very rudimentary API which only allows to specify: select columns (any complex expression or function calls or window functions are ok as long as they don't "SELECT FROM" some table), table name, where condition, group by and order by. It doesn't allow derived tables, joins or correlated subqueries that selects from other tables.

My question is, is it possible to calculate the distributed time with only the above constructs. I'm either missing something very simple, or maybe it's very hard. Optionally, I want to be able to GROUP BY car / year as well.

A query I considered:

SELECT speed
    , SUM(hoursSpent) / (SELECT SUM(hoursInPeriod) * 1.0 / COUNT(*))
FROM #stats ss
GROUP BY speed

It looks pretty good, unfortunately, it doesn't return correct result, because the COUNT isn't evenly distributed.

juud5qan

juud5qan1#

You can use window functions with aggregation. Starting from your existing query:

SELECT speed,
    1.0 * SUM(hoursSpent)
        / NULLIF(SUM(SUM(hoursInPeriod)) OVER(), 0) as ratio
FROM #stats ss
GROUP BY speed

The window sum computes the grand total of hours spent, which we can use as a denominator to compute the ratio (with a safety against division-by-zero errors).

If you have repeated hoursInPeriod on each year, then we probably need another level of aggregation:

SELECT speed, 
    1.0 * SUM(hoursSpent)
        / NULLIF(SUM(SUM(hoursInPeriod)) OVER(), 0) as ratio
FROM (
    SELECT speed, hoursInPeriod, SUM(hoursSpent) hoursSpent
    FROM #stats
    GROUP BY speed, hoursInPeriod
) s
GROUP BY speed
zlhcx6iw

zlhcx6iw2#

Welcome to the wonderful world of windowed functions!

Give this a try:

SELECT DISTINCT Car, Speed, yr, SUM(.0+hoursSpent) OVER (PARTITION BY Car, yr, Speed ORDER BY Speed) AS HoursAtSpeed, MAX(.0+hoursInPeriod) OVER (PARTITION BY Car, Yr ORDER BY (SELECT 1)) AS TotalHours, 
SUM(.0+hoursSpent) OVER (PARTITION BY Car, yr, Speed ORDER BY Speed)/MAX(.0+hoursInPeriod) OVER (PARTITION BY yr ORDER BY (SELECT 1))
  FROM #stats
 ORDER BY Yr, Car, Speed
CarSpeedyrHoursAtSpeedTotalHoursPercentHours
Tesla50.020195.0100.00.016666
Tesla65.0201920.0100.00.066666
Tesla70.0201910.0100.00.033333
Volvo50.0201920.0300.00.066666
Volvo65.0201913.0300.00.043333
Volvo70.0201930.0300.00.100000
Tesla50.0202010.0100.00.040000
Tesla65.0202020.0100.00.080000
Tesla70.0202013.0100.00.052000
Volvo50.0202010.0250.00.040000
Volvo65.0202025.0250.00.100000
Volvo70.0202040.0250.00.160000
Tesla50.0202130.0100.00.300000
Tesla65.0202120.0100.00.200000
Tesla70.0202150.0100.00.500000
Volvo50.020215.0100.00.050000
Volvo70.0202110.0100.00.100000

You can adjust how the values are group by changing the columns included in the PARTITIION BY for each of the windowed functions. I have it broken down by car, speed and year in this example.

hfyxw5xn

hfyxw5xn3#

If you want to get the percentage for each row, per car per year, you just need to remove the aggregation and add a PARTITION BY with a window function:

SELECT
  s.car,
  s.yr,
  s.speed,
  s.hoursSpent * 1.0 / SUM(s.hoursSpent) OVER (PARTITION BY s.car, s.yr) 
FROM #stats s;

db<>fiddle

bhmjp9jg

bhmjp9jg4#

Can you make two separate calls to the API?

Below is a pseudo sample based on Python to combine the results of 2 calls without subqueries (nb "api_call" is a placeholder for how you call your api):

Get the needed grand total of hoursInPeriod into a variable:

total_hours_in_period_query = "SELECT SUM(hoursInPeriod) FROM (SELECT DISTINCT car, yr, hoursInPeriod FROM #stats) AS subq"
total_hours_in_period = api_call(total_hours_in_period_query)

Then make a second call to get the sum of hours spent for each speed and divided by the result of the first call:

hours_spent_query = f"SELECT speed, SUM(hoursSpent) / {total_hours_in_period} FROM #stats GROUP BY speed"
results = api_call(hours_spent_query)

相关问题