I have a query where I iterate through a table -> for each entry I iterate through another table and then compute some results. I use a cursor for iterating through the table. This query takes ages to complete. Always more than 3 minutes. If I do something similar in C# where the tables are arrays or dictionaries it doesn't even take a second. What am I doing wrong and how can I improve the efficiency?
DELETE FROM [QueryScores]
GO
INSERT INTO [QueryScores] (Id)
SELECT Id FROM [Documents]
DECLARE @Id NVARCHAR(50)
DECLARE myCursor CURSOR LOCAL FAST_FORWARD FOR
SELECT [Id] FROM [QueryScores]
OPEN myCursor
FETCH NEXT FROM myCursor INTO @Id
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @Score FLOAT = 0.0
DECLARE @CounterMax INT = (SELECT COUNT(*) FROM [Query])
DECLARE @Counter INT = 0
PRINT 'Document: ' + CAST(@Id AS VARCHAR)
PRINT 'Score: ' + CAST(@Score AS VARCHAR)
WHILE @Counter < @CounterMax
BEGIN
DECLARE @StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = @Counter)
DECLARE @Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = @StemId AND [tfidf].[DocumentId] = @Id)
PRINT 'WEIGHT: ' + CAST(@Weight AS VARCHAR)
IF(@Weight > 0.0)
BEGIN
DECLARE @QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = @StemId)
SET @Score = @Score + (@QWeight * @Weight)
PRINT 'Score: ' + CAST(@Score AS VARCHAR)
END
SET @Counter = @Counter + 1
END
UPDATE [QueryScores] SET Score = @Score WHERE Id = @Id
FETCH NEXT FROM myCursor INTO @Id
END
CLOSE myCursor
DEALLOCATE myCursor
The logic is that i have a list of docs. And I have a question/query. I iterate through each and every doc and then have a nested iteration through the query terms/words to find if the doc contains these terms. If it does then I add/multiply pre-calculated scores.
3条答案
按热度按时间bwntbbo31#
The problem is that you're trying to use a set-based language to iterate through things like a procedural language. SQL requires a different mindset. You should almost never be thinking in terms of loops in SQL.
From what I can gather from your code, this should do what you're trying to do in all of those loops, but it does it in a single statement in a set-based manner, which is what SQL is good at.
ulydmbyx2#
The query I came up with is very similar to the one from Tom H.
There's a lot of unknowns about the problem OP code is trying to solve. Is there a particular reason the code only checks for rows in the
Query
table where theId
value is between 0 and one less than the number of rows in the table? Or is the intent really just to get all of the rows fromQuery
?Here's my version:
Processing RBAR (row by agonizing row) is almost always going to be slower than processing as a set. SQL is designed to operate on sets of data. There is overhead for each individual SQL statement, and for each context switch between the procedure and the SQL engine. Sure, there might be room to improve performance of individual parts of the procedure, but the big gain is going to be doing an operation on the entire set, in a single SQL statement.
If there's some reason you need to process one document at a time, using a cursor, then get rid of the loops and individual selects and all those PRINT, and just use a single query to get the score for the document.
ecfdbz9o3#
You might not even need documents