该脚本工作正常,并输出正是我需要它输出。
当我有一个大的CSV文件要处理时,我的问题就来了(大约500 Mb,大约600万行)。
脚本需要很长时间才能运行。我知道处理这么多数据需要一段时间,但我想知道是否有方法可以改进它!以下是浓缩的脚本:
$DnsFilePath = "C:\dns.log"
Param([string]$DnsFilePath)
If (Test-Path $DnsFilePath)
{
$FileInfo = Get-ChildItem -Path $DnsFilePath
$Ans = Read-Host "Do you want to continue(y/n)?"
If ($Ans -eq 'y')
{
If (!($SkipLines)) { Write-Host "Processing..."; }
$i = 0; ## Set to count the number of records;
$Timer= [Diagnostics.Stopwatch]::StartNew() ## Start the timer
$ArrayOfStrings = [System.Collections.ArrayList]@()
Switch -regex ([System.IO.File]::ReadLines($FileInfo.fullname)) {
' UDP Rcv ' {
$Datetime = [regex]::matches($switch.current,'\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2} (AM|PM)').Value
$IP = [regex]::matches($switch.current,'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b').Value
$FQDN = [regex]::matches($switch.current,"\)[A-z0-9-_]*\(").Value -replace "\)|\(","" -join "."
[void]$ArrayOfStrings.Add("$Datetime,$IP,$FQDN")
$i++;
}
}
$OutFilePath = "$($FileInfo.DirectoryName)\$($FileInfo.BaseName)_Parsed.txt"
[System.IO.File]::WriteAllLines($OutFilePath, $ArrayOfStrings)
$Timer.stop()
Write-host "Total time elapsed: $($Timer.Elapsed.ToString('hh\:mm\:ss\.ff'))"
Write-Host "Number of Record Processed: $i"
Write-Host "Parsed File created successfully at $OutFilePath"
}
else
{ Write-Host "Script exits." }
}
Else
{
Write-Host -fore Red "File does not exist in the following location: $DnsFilePath. Script exits."
}
字符串
DNS日志示例:
DNS Server log file creation at 7/10/2023 10:55:42 AM
Log file wrap at 7/10/2023 10:55:42 AM
Message logging key (for packets - other items use a subset of these fields):
Field # Information Values
------- ----------- ------
1 Date
2 Time
3 Thread ID
4 Context
5 Internal packet identifier
6 UDP/TCP indicator
7 Send/Receive indicator
8 Remote IP
9 Xid (hex)
10 Query/Response R = Response
blank = Query
11 Opcode Q = Standard Query
N = Notify
U = Update
? = Unknown
12 [ Flags (hex)
13 Flags (char codes) A = Authoritative Answer
T = Truncated Response
D = Recursion Desired
R = Recursion Available
14 ResponseCode ]
15 Question Type
16 Question Name
7/10/2023 10:55:42 AM 1B7C PACKET 000001D9D88C68D0 UDP Rcv 8.8.8.8 5fb1 R Q [8381 DR NXDOMAIN] A (3)www(12)autodiscover(5)st1ad(4)emea(15)microsoftonline(3)com(0)
7/10/2023 10:55:42 AM 1B7C PACKET 000001D9D775F890 UDP Snd 10.x.x.x 92cb R Q [8381 DR NXDOMAIN] A (3)www(12)autodiscover(5)st1ad(4)emea(15)microsoftonline(3)com(0)
7/10/2023 10:55:42 AM 1B7C PACKET 000001D9E4E338D0 UDP Rcv 10.x.x.x a9bd Q [0001 D NOERROR] A (18)addinsinstallation(5)store(6)office(3)com(0)
7/10/2023 10:55:42 AM 1B7C PACKET 000001D9D775F890 UDP Snd 8.8.8.8 afda Q [0001 D NOERROR] A (23)prod-addinsinstallation(15)omexexternallfb(6)office(3)net(6)akadns(3)net(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E182BB80 UDP Rcv 10.x.x.x d229 Q [0001 D NOERROR] SOA (15)pc_host01(7)contoso(5)local(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E182BB80 UDP Snd 10.x.x.x d229 R Q [8085 A DR NOERROR] SOA (15)pc_host02(7)contoso(5)local(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E2A2D670 UDP Rcv 8.8.8.8 c95c R Q [8081 DR NOERROR] A (9)dtr-a-ncu(2)na(8)azurerms(3)com(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E1998D80 UDP Snd 10.x.x.x 2047 R Q [8081 DR NOERROR] A (6)portal(8)azurerms(3)com(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E2D07D00 UDP Rcv 10.x.x.x 788e Q [0001 D NOERROR] A (2)tr(11)c1182306347(12)ip4-58f0802d(4)wgcs(7)skyhigh(5)cloud(0)
7/10/2023 10:55:42 AM 1B78 PACKET 000001D9E1998D80 UDP Snd 8.8.8.8 1c22 Q [0001 D NOERROR] A (2)tr(11)c1182306347(12)ip4-58f0802d(4)wgcs(7)skyhigh(5)cloud(0)
型
1条答案
按热度按时间rqenqsqc1#
不要将所有结果收集到
$ArrayOfStrings
中,而是立即将结果直接写入文件!为了避免每次都必须关闭并重新打开文件句柄,请重复使用同一个句柄:
字符串