I have a very large database, over 3 million rows.
It contains a list of users, their details, what applications they use (so may be several lines of same user 1 for each application, there are 70 separate applications)
Our customer has gone through the list and decided they only want to keep about a tenth of the users and has requested we delete the rest.
My SQL is basic, I can delete where conditions are met etc.
However I am not skilled enough to create a script that will delete each row where the conditions are not met by the use of lists.
for example:
| UserId | Username | ApplicationID | FirstName | Lastname |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | JB1 | 50122 | John | Brown |
| 2 | SH2 | 201 | Sarah | Harness |
| 3 | TT1 | 2699 | Tom | Thwaite |
| 4 | TT1 | 50122 | Tom | Thwaite |
| 5 | JB1 | 201 | John | Brown |
| 6 | SH1 | 2699 | Sally | Holmes |
| 7 | PE1 | 201 | Paul | East |
| 8 | MP1 | 2699 | Mike | Peterson |
unfortunately the list we got back did not include the userid as this would have made things simple.
So I am looking for a script that will go through and delete each row where the username = xxxx and the applicationid = xxxx, once its done that move onto next deletion, different user and applicationID and so on.
I did think of running multiple scripts, basically one for each application, so delete from xxx where applicationID = xxxx and username in (a,b,c,d)
However have been told by change board that this is not acceptable and needs to be done in one hit
hope someone can help
have increased the info on the table so larger version below. (table has 31 columns btw)
| UserId | Username | ApplicationID | FirstName | Lastname | PortalID | Orgunit | country |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | JB1 | 122 | John | Brown | 123 | AFr | UK |
| 2 | SH2 | 122 | Sarah | Harness | 12 | Ml23 | US |
| 3 | TT1 | 122 | Tom | Thwaite | 122 | JJ30 | Uk |
| 4 | JB1 | 125 | John | Brown | 125 | Afr | UK |
| 5 | LL1 | 125 | Lesley | Leeson | 125 | ML222 | US |
| 6 | PM1 | 125 | Paul | Mackenzie | 125 | AS239 | EIRE |
| 7 | TT1 | 126 | Tom | Thwaite | 126 | Grlf | EIRE |
| 8 | SH1 | 188 | Sally | Holmes | 188 | Grlf | US |
| 9 | SH2 | 188 | Sarah | Harness | 188 | Ml23 | US |
| 10 | JB1 | 188 | John | Brown | 188 | ML222 | UK |
| 11 | Ol1 | 188 | Oliver | Leeson | 188 | ST34 | JPN |
| 12 | MK1 | 201 | Mike | Kendle | 201 | HJJF | UK |
| 13 | PM1 | 201 | Paul | Mackenzie | 201 | Grlf | UK |
| 14 | MK1 | 203 | Mike | Keown | 203 | Grlf | UK |
2条答案
按热度按时间fdx2calv1#
there is my concept of done this task:
Having extra table should help you to organize and determine which user/app you want to del by single query without condition mess
oaxa6hgo2#
hey guys the Db is so big I resolved it by utilising a mix of excel and sql.
Using the excel file supplied by the customer I created a temp db and importedit as a flat file creating a table with the information in it. I then joined the tables using the columns that I want to use to filter by
so I used following scripts
select * from person.person P with (nolock) -- Thisgave me my full list to check the column names
I then created person.upload using the flat file/
select * from person.person P with (nolock) join person.upload u with (nolock) on p.username = U.username and P.applicationid = I.applicationID
This allowed me to check all the data was matched.
I then ran the update:
update person join person.upload on p.username = U.username and P.applicationid = I.applicationID set Enabled = 0 and passwordreminder = 'Disabled to delete'
I can then delete the users simply using
delete from person.person where enaled = 0 and passwordreminder='Disabled to delete'
took awhile but 2.67 million rows deleted.