Active Directory / PowerShell中的名称和/或用户名部分/接近匹配

twh00eeo  于 2023-06-29  发布在  Shell
关注(0)|答案(5)|浏览(168)

我们的用户有时会给我们拼写错误的名称/用户名,我希望能够搜索活动目录的一个接近匹配,排序最接近(任何算法都可以)。例如,如果我尝试
Get-Aduser -Filter {GivenName -like "Jack"}
我可以找到用户Jack,但如果我使用“Jacck”或“ack”就找不到了
有没有一个简单的方法来做到这一点?

jutyujz0

jutyujz01#

您可以计算两个字符串之间的Levenshtein distance,并确保它低于某个阈值(可能是1或2)。这里有一个PowerShell的例子:Levenshtein distance in powershell
示例:

  • 杰克和杰克的ID是1。
  • Jack和ack的ID都是1。
  • 帕勒和哈内福格德的LD为8。
hfwmuf9z

hfwmuf9z2#

有趣的问题和答案。但一个可能的更简单的解决方案是通过多个属性进行搜索,因为我希望大多数人都能正确拼写他们的名字:)

Get-ADUser -Filter {GivenName -like "FirstName" -or SurName -Like "SecondName"}
ax6ht2ek

ax6ht2ek3#

Soundex算法就是为这种情况而设计的。以下是一些PowerShell代码,可能会有所帮助:
Get-Soundex.ps1

pb3s4cty

pb3s4cty4#

好的,基于我得到的很好的答案(谢谢@boxdog和@Palle Due),我发布了一个更完整的答案。
主要来源:https://github.com/gravejester/Communary.PASM- PowerShell近似字符串匹配。这个主题的伟大模块。

1)FuzzyMatchScore函数

来源:https://github.com/gravejester/Communary.PASM/tree/master/Functions

# download functions to the temp folder
$urls = 
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-CommonPrefix.ps1"    ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LevenshteinDistance.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LongestCommonSubstring.ps1"  ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-FuzzyMatchScore.ps1" 

$paths = $urls | %{$_.split("\/")|select -last 1| %{"$env:TEMP\$_"}}

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
for($i=0;$i -lt $urls.count;$i++){
Invoke-WebRequest -Uri $urls[$i] -OutFile $paths[$i]
}

# concatenating the functions so we don't have to deal with source permissions
foreach($path in $paths){
cat $path | Add-Content "$env:TEMP\Fuzzy_score_functions.ps1"
}

# to save for later, open the temp folder with: Invoke-Item $env:TEMP 
# then copy "Fuzzy_score_functions.ps1" somewhere else

# source Fuzzy_score_functions.ps1
. "$env:TEMP\Fuzzy_score_functions.ps1"

简单测试:

Get-FuzzyMatchScore "a" "abc" # 98

创建评分函数:

## start function
function get_score{
param($searchQuery,$searchData,$nlist,[switch]$levd)

if($nlist -eq $null){$nlist = 10}

$scores = foreach($string in $searchData){
    Try{
    if($levd){    
        $score = Get-LevenshteinDistance $searchQuery $string }
    else{
        $score = Get-FuzzyMatchScore -Search $searchQuery -String $string }
    Write-Output (,([PSCustomObject][Ordered] @{
                        Score = $score
                        Result = $string
                    }))
    $I = $searchData.indexof($string)/$searchData.count*100
    $I = [math]::Round($I)
    Write-Progress -Activity "Search in Progress" -Status "$I% Complete:" -PercentComplete $I
    }Catch{Continue}
}

if($levd) { $scores | Sort-Object Score,Result |select -First $nlist }
else {$scores | Sort-Object Score,Result -Descending |select -First $nlist }
} ## end function

示例

get_score "Karolin" @("Kathrin","Jane","John","Cameron")

# check the difference between Fuzzy and LevenshteinDistance mode
$names = "Ferris","Cameron","Sloane","Jeanie","Edward","Tom","Katie","Grace"
"Fuzzy"; get_score "Cam" $names
"Levenshtein"; get_score "Cam" $names -levd

在大数据集上测试性能

## donload baby-names

$url = "https://github.com/hadley/data-baby-names/raw/master/baby-names.csv"
$output = "$env:TEMP\baby-names.csv"
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-WebRequest -Uri $url -OutFile $output
$babynames = import-csv "$env:TEMP\baby-names.csv"
$babynames.count # 258000 lines

$babynames[0..3] # year, name, percent, sex

$searchdata = $babynames.name[0..499]

$query = "Waren" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd

$query = "Jon" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd

$query = "Howie" # lookalike
"Fuzzy"; get_score $query $searchdata;
"Levenshtein"; get_score $query $searchdata -levd

测试

$query = "John"

$res = for($i=1;$i -le 10;$i++){
    $searchdata = $babynames.name[0..($i*100-1)]
    $meas = measure-command{$res = get_score $query $searchdata}
    write-host $i
    Write-Output (,([PSCustomObject][Ordered] @{
        N = $i*100
        MS = $meas.Milliseconds
        MS_per_line = [math]::Round($meas.Milliseconds/$searchdata.Count,2)
                    }))
}
$res

+------+-----+-------------+
| N    | MS  | MS_per_line |
| -    | --  | ----------- |
| 100  | 696 | 6.96        |
| 200  | 544 | 2.72        |
| 300  | 336 | 1.12        |
| 400  | 6   | 0.02        |
| 500  | 718 | 1.44        |
| 600  | 452 | 0.75        |
| 700  | 224 | 0.32        |
| 800  | 912 | 1.14        |
| 900  | 718 | 0.8         |
| 1000 | 417 | 0.42        |
+------+-----+-------------+

这些时间是相当疯狂的,如果有人明白为什么请评论它。

2)从Active Directory生成一个Name表

最好的方法取决于AD的组织。这里我们有许多OU,但是普通用户将在Users和DisabledUsers中。域和DC也会不同(我在这里将我们的更改为<domain><DC>)。

# One way to get a List of OUs
Get-ADOrganizationalUnit -Filter * -Properties CanonicalName | 
  Select-Object -Property CanonicalName

则可以使用Where-Object -FilterScript {}按OU进行筛选

# example, saving on the temp folder
Get-ADUser -f * |
 Where-Object -FilterScript {
    ($_.DistinguishedName -match "CN=\w*,OU=DisabledUsers,DC=<domain>,DC=<DC>" -or
    $_.DistinguishedName -match "CN=\w*,OU=Users,DC=<domain>,DC=<DC>") -and
    $_.GivenName -ne $null #remove users without givenname, like test users
    } | 
    select @{n="Fullname";e={$_.GivenName+" "+$_.Surname}},
    GivenName,Surname,SamAccountName |
    Export-CSV -Path "$env:TEMP\all_Users.csv" -NoTypeInformation
# you can open the file to inspect 
Invoke-Item "$env:TEMP\all_Users.csv"
# import
$allusers = Import-Csv "$env:TEMP\all_Users.csv"
$allusers.Count # number of lines

用途:

get_score "Jane Done" $allusers.fullname 15 # return the 15 first
get_score "jdoe" $allusers.samaccountname 15
toe95027

toe950275#

这在一定程度上适用于各种属性的模糊名称解析,但不适用于“Jacck”拼写错误。我得到了五个结果。

get-aduser -filter 'anr -eq "ack"' -ResultSetSize 5

相关问题