I have a table in my database.The table has a field VehId(Int) and trackdt(Datetime)
My table has 3000 billion rows(yes, that's right 3000 billion). so i want to delete old data. but i want to delete data based on time interval.
I want to delete records of every 20 seconds for each vehicle.
Below is the table I have
VehId Trackdt
1 2017-05-20 00:00:30.000
2 2017-05-20 00:00:32.000
2 2017-05-20 00:00:42.000
1 2017-05-20 00:00:40.000
2 2017-05-20 00:00:52.000
1 2017-05-20 00:00:50.000
1 2017-05-20 00:01:00.000
2 2017-05-20 00:01:02.000
1 2017-05-20 00:01:10.000
1 2017-05-20 00:01:20.000
2 2017-05-20 00:01:12.000
1 2017-05-20 00:01:30.000
2 2017-05-20 00:01:22.000
2 2017-05-20 00:01:32.000
After delete data should be like below
VehId TRackdt
1 2017-05-20 00:00:30.000
2 2017-05-20 00:00:32.000
1 2017-05-20 00:01:00.000
2 2017-05-20 00:01:02.000
1 2017-05-20 00:01:30.000
2 2017-05-20 00:01:32.000
I tried below query but it is taking too much time
ALTER PROCEDURE [dbo].[DELETEINTERVALDATA]
@FROMDATE DATETIME,
@TODATE DATETIME,
@INTERVAL INT,
@FLAG INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @TRACKDT DATETIME
DECLARE @I INT =1
DECLARE @V INT =1
DECLARE @COUNT INT
DECLARE @VCOUNT INT
DECLARE @STARTDATE DATETIME = ''
DECLARE @VEHID INT
DECLARE @TIMEDIFF INT
CREATE TABLE #TEMPVEHICLE
(
SNO INT IDENTITY(1,1),
VEHID INT
)
CREATE TABLE #TEMPLOG
(
SNO INT IDENTITY(1,1),
TRACKDT DATETIME
)
IF (@FLAG = 1 )
BEGIN
INSERT INTO #TEMPVEHICLE (VEHID) SELECT VEHID FROM VEHICLEMASTER ORDER BY VEHID
SELECT @VCOUNT = COUNT(SNO) FROM #TEMPVEHICLE
WHILE (@V <= @VCOUNT)
BEGIN
SELECT @VEHID = VEHID FROM #TEMPVEHICLE WHERE SNO = @V
INSERT INTO #TEMPLOG(TRACKDT) SELECT TRACKDT
FROM TRACKINGLOG WITH(NOLOCK)
WHERE TRACKDT BETWEEN @FROMDATE AND @TODATE AND VEHID = @VEHID
ORDER BY TRACKDT ASC
SELECT @COUNT = COUNT(SNO) FROM #TEMPLOG
WHILE (@I <= @COUNT)
BEGIN
SELECT @TRACKDT=TRACKDT FROM #TEMPLOG WHERE SNO = @I
IF (@I = 1)
BEGIN
SELECT @STARTDATE = @TRACKDT
END
ELSE
BEGIN
SELECT @TIMEDIFF = DATEDIFF(SECOND,@STARTDATE,@TRACKDT)
IF @TIMEDIFF <= 20
BEGIN
DELETE FROM TRACKINGLOG WHERE TRACKDT = @TRACKDT AND VEHID = @VEHID
END
ELSE
BEGIN
SELECT @STARTDATE = @TRACKDT
END
END
SELECT @I = @I + 1
END
TRUNCATE TABLE #TEMPLOG
SELECT @V = @V + 1,@STARTDATE= '',@I=1
END
DROP TABLE #TEMPLOG
DROP TABLE #TEMPVEHICLE
END
END
How can I write query which delete data based on time interval and should be quick??
Thanks in advance
4条答案
按热度按时间6qqygrtg1#
Your desired output corresponds to a 30 second interval.
You can retrieve previous values in a result set using window functions. For example, LAG(trackdt,1) will return the previous value.
LEAD
the next.FIRST_VALUE
will return the first value in a set.The query :
Will return the first
trackdt
value per vehicle, when the rows are ordered bytrackdt
.This way we can calculate the interval between the current and first value in the partition with
datediff
. The resulting query is a bit ugly:Dividing by 30 would give us the interval bucket to which each row belongs.
Calculating the remainder though, would return 0 for the first row in every 30-second bucket:
We can use one or more CTEs to tidy-up this query and perform the deletion:
This query depends on having values every 30 seconds.
In general, you can use the
ROW_NUMBER()
function to identify records within an interval bucket and simply pick the first row in a bucket. Since we want to retain the first row, we pick anything that has a ROW_NUMBER>1:Even though this query is fast, I wouldn't use it on a 3 trillion row table. Windowing often results in spooling - temporary results are stored in tempdb to allow window calculations. It would be better to select the IDs of the rows that have to be deleted, insert them into a temporary table and then use a DELETE with a join clause:
Even so, you probably want to batch deletions. In this case you could use the NTILE function to calculate a batch number for each row between 1 and N:
This will calculate a batch number between 1 and 100. You can store this in the temporary table and delete one batch of rows at a time.
bis0qfac2#
You want to delete a record if the time was 20 seconds or less after the previous time. But if the previous time gets deleted you than you want to compare with the time before that etc. You will need some kind of loping for that. A recursive table expression or regular looping.
pgvzfuti3#
With 30TBs of records it will take a week or more even with the fastest method.
You could try
I'd first do the first 3 steps on one 10-million-record CSV file and calculate how long it will take to move the whole thing this way
yeotifhr4#
Please read about Parameter sniffing . Assuming your stored procedure is slow BUT correct, I have updated your SP. Kindly Try :