位置内的伪匿名用户ID

x4shl7ld  于 2021-08-01  发布在  Java
关注(0)|答案(2)|浏览(426)

假设我有下表

|   City |   Name |
|--------|--------|
|  Paris |    Tim |
|  Paris |  Alice |
|  Paris | Sandra |
|  Paris | Sandra |
|  Paris |   Lisa |
| London |    Joe |
| London |    Joe |
| London |    Tim |
| London |  Peter |

我想为每个城市的唯一用户生成一个(伪)“匿名化”的名字(例如 city 每个城市的名称后缀为 _n 其中n是每个城市的每个唯一名称的简单计数)。
我可以做以下事情

SELECT City
       , Name
       , CONCAT(City, '_', ROW_NUMBER() OVER (PARTITION BY City ORDER BY Name)) AS anon_name
  FROM employees
 GROUP BY City
          , Name;

这给了我想要的:

|   City |   Name | anon_name |
|--------|--------|-----------|
| London |    Joe |  London_1 |
| London |  Peter |  London_2 |
| London |    Tim |  London_3 |
|  Paris |  Alice |   Paris_1 |
|  Paris |   Lisa |   Paris_2 |
|  Paris | Sandra |   Paris_3 |
|  Paris |    Tim |   Paris_4 |

但是,我对这个解决方案不是很满意,因为用户可以派生出来,因为数字只是按字母顺序分配的。
而不是我可以用的序列 NEWID() :

SELECT City
       , Name
       , CONCAT(City, '_', NEWID()) AS anon_name
  FROM employees
 GROUP BY City
          , Name;
|   City |   Name |                                   anon_name |
|--------|--------|---------------------------------------------|
| London |    Joe | London_A06BFFC9-E122-4CB2-AFDE-B1980EC50A1F |
| London |  Peter | London_B153D072-F75A-4004-B922-189D64656AC1 |
| London |    Tim | London_AF40F949-5D55-4896-907F-17EE0165C569 |
|  Paris |  Alice |  Paris_9F779CFA-B799-409A-801B-9CC044295B13 |
|  Paris |   Lisa |  Paris_CFA8AB45-CCEF-4223-B86B-8A71A8335D13 |
|  Paris | Sandra |  Paris_1FEE5266-8FFF-431E-A76D-CB490FE77BB4 |
|  Paris |    Tim |  Paris_A7BAF19F-1B97-470D-BF41-81C5B6BFFCBA |

这会生成相当长的标签,但无法将其与名称关联起来。
因此,我的问题是:
我能得到一个简短的版本(只是一个数字后缀),它是随机的,这样名称就不能很容易地从数字中派生出来,比如

|   City |   Name | anon_name |
|--------|--------|-----------|
| London |    Joe |  London_2 |
| London |  Peter |  London_1 |
| London |    Tim |  London_3 |
|  Paris |  Alice |   Paris_1 |
|  Paris |   Lisa |   Paris_2 |
|  Paris | Sandra |   Paris_4 |
|  Paris |    Tim |   Paris_3 |

示例数据生成者:

CREATE TABLE employees
    ([City] varchar(8), [Name] varchar(8))
;

INSERT INTO employees
    ([City], [Name])
VALUES
    ('Paris', 'Tim'),
    ('Paris', 'Alice'),
    ('Paris', 'Sandra'),
    ('Paris', 'Sandra'),
    ('London', 'Joe'),
    ('London', 'Joe'),
    ('London', 'Tim'),
    ('London', 'Peter')
;
rdlzhqv9

rdlzhqv91#

你可以用 newid() 在第一个表达式中:

SELECT City, Name,
       CONCAT(City, '_', ROW_NUMBER() OVER (PARTITION BY City ORDER BY NEWID())) AS anon_name
FROM employees
GROUP BY City, Name;

如果你只是想要一个随机后缀,你可以使用 checksum() :

SELECT City, Name,
       CONCAT(City, '_', CHECKSUM(Name)) AS anon_name
FROM employees
GROUP BY City, Name;

然而,存在冲突的风险——两个同名的用户有不同的校验和。如果你不需要顺序ID,有很多方法可以解决这个问题。

k3bvogb1

k3bvogb12#

随机排序的怎么样 row_number() ?

SELECT 
    City,
    Name,
    CONCAT(City, '_', ROW_NUMBER() OVER (PARTITION BY City ORDER BY newid())) AS anon_name
 FROM employees
 GROUP BY City, Name;

相关问题