SQL Server Shuffling data across rows in a table

rn0zuynd  于 9个月前  发布在  其他
关注(0)|答案(2)|浏览(81)

I want to update a table to have the data shuffled around. Let's say I have a table with 5 columns. column 1 is the primary key so that'll remain unchanged. columns 2 and 3 will be shuffled to different rows. columns 4 and 5 will also be shuffled but as pairs (they must be moved together).

How would I write the script for this? The primary key is a random unique identifier so it doesn't correlate with the row number.

Before:
| c1 | c2 | c3 | c4 | c5 |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 10 | 1 | 2 | 3 | 4 |
| 20 | 5 | 6 | 7 | 8 |
| 30 | 9 | 10 | 11 | 12 |
| 40 | 13 | 14 | 15 | 16 |

After:

c1c2c3c4c5
105141112
20961516
301234
40131078

To provide an update on why I am doing this, this isn't a production database or anything. I'm shuffling the data around to anonymize the data for testing purposes, that's about it.

To provide additional details, say this is a people table. I might shuffle the address or phone number to anonymize the person.

ccgok5k5

ccgok5k51#

If temporary tables are allowed:

  • Create a temporary table that contains row_number based keys

  • One key acts as a secondary key

  • Other keys will consist secondary key values but sorted randomly

select *
     , sk = row_number() over (order by c1)
     , k1 = row_number() over (order by newid())
     , k2 = row_number() over (order by newid())
     , k3 = row_number() over (order by newid())
into #temp
from t
c1c2c3c4c5skk1k2k3
1012341431
2056782342
3091011123123
40131415164214

You can use an update-with-join query:

update t set
t.c2 = a.c2,
t.c3 = b.c3,
t.c4 = c.c4,
t.c5 = c.c5
from t
join #temp as x on t.c1 = x.c1
join #temp as a on x.sk = a.k1
join #temp as b on x.sk = b.k2
join #temp as c on x.sk = c.k3

DB<>Fiddle

z4iuyo4d

z4iuyo4d2#

You appear to be under the impression that all these values are unrelated; they aren't. 10 is related directly to 1 , 2 , 3 and 4 , and those values to each other as well. If you "shuffle" the value of 10 (in c1 ) then the values 1 , 2 , 3 and 4 go with it as they are a single row.

If you need to do this type of thing then firstly I'd suggest that all these values are on different rows and denote what "c" they belong to. So your data would look more like this:
| c | v |
| ------------ | ------------ |
| 1 | 10 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 4 |
| 1 | 20 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 5 | 8 |

etc...

If your data looks like that, it's far easier to get the results you want. You can order you data on something "random" and the use conditional aggregation to get the values into their respective columns:

WITH CTE AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY C ORDER BY NEWID()) AS I
    FROM (VALUES(1,10),
                (2,1),
                (3,2),
                (4,3),
                (5,4),
                (1,20),
                (2,5),
                (3,6),
                (4,7),
                (5,8))YT(C,V))
SELECT MAX(CASE C WHEN 1 THEN V END) AS C1,
       MAX(CASE C WHEN 2 THEN V END) AS C2,
       MAX(CASE C WHEN 3 THEN V END) AS C3,
       MAX(CASE C WHEN 4 THEN V END) AS C4,
       MAX(CASE C WHEN 5 THEN V END) AS C5
FROM CTE
GROUP BY I;

If you "can't" get your data in a better format, you could unpivot it before you repivot it:

WITH CTE AS(
    SELECT V.C,
           V.V,
           ROW_NUMBER() OVER (PARTITION BY V.C ORDER BY NEWID()) AS I
    FROM (VALUES(10, 1, 2, 3, 4),
                (20, 5, 6, 7, 8),
                (30, 9,10,11,12),
                (40,13,14,15,16))YT(C1,C2,C3,C4,C5)
         CROSS APPLY (VALUES(1,C1),
                            (2,C2),
                            (3,C3),
                            (4,C4),
                            (5,C5))V(C,V))
SELECT MAX(CASE C WHEN 1 THEN V END) AS C1,
       MAX(CASE C WHEN 2 THEN V END) AS C2,
       MAX(CASE C WHEN 3 THEN V END) AS C3,
       MAX(CASE C WHEN 4 THEN V END) AS C4,
       MAX(CASE C WHEN 5 THEN V END) AS C5
FROM CTE
GROUP BY I;

相关问题