regex PowerShell从脚本中删除所有注解

5anewei6 于 2022-12-14 发布在 Shell

关注(0)|答案(4)|浏览(144)

我正在寻找一种方法从一个文件中剥离所有注解。有各种各样的方法来做注解，但我只对简单的#形式的注解感兴趣。原因是我只在函数.SYNOPSIS中使用<# #>，这是函数代码，而不仅仅是一个注解，所以我想保留那些）。

编辑：我已经使用下面有用的答案更新了这个问题。*

因此，我只需要几种情况：
a）整行注解在行首使用#（或者可能在前面使用空格）。即^\s*#的正则表达式似乎可以工作。
B）在一行的开头加上一些代码，然后在该行的结尾加上一个命令。我想避免剥离具有例如Write-Host "#####"的行，但我认为这在我的代码中已经涵盖。
我可以用拆分删除行尾注解，因为我不知道如何用正则表达式来做，有人知道用正则表达式来实现这一点的方法吗？
这个拆分并不理想，因为一行中的<#会被-split删除，但我已经通过在" #"上拆分来修复了这个问题。这并不完美，但可能已经足够好了--也许存在一种更可靠的正则表达式处理方法？
当我对我的7,000行长的脚本做下面的工作时，它工作（！）并剥离了大量的注解，但是，输出文件的大小几乎是两倍（！？）从400 kb到大约700 kb。有人知道为什么会发生这种情况以及如何防止这种情况发生吗（是与BOM或Unicode或类似的东西有关吗？Out-File似乎真的气球文件大小！）

$x = Get-Content ".\myscript.ps1"   # $x is an array, not a string
$out = ".\myscript.ps1"
$x = $x -split "[\r\n]+"               # Remove all consecutive line-breaks, in any format '-split "\r?\n|\r"' would just do line by line
$x = $x | ? { $_ -notmatch "^\s*$" }   # Remove empty lines
$x = $x | ? { $_ -notmatch "^\s*#" }   # Remove all lines starting with ; including with whitespace before
$x = $x | % { ($_ -split " #")[0] }    # Remove end of line comments
$x = ($x -replace $regex).Trim()       # Remove whitespace only at start and end of line
$x | Out-File $out
# $x | more

regex

来源：https://stackoverflow.com/questions/60996992/powershell-remove-all-comments-from-a-script

4条答案

按热度按时间

jw5wzhpr1#

老实说，识别和处理所有评论的最好方法是使用PowerShell的语言解析器或Ast类之一。因此这是一种过滤掉块和行注解的丑陋方式。

$code = Get-Content file.txt -Raw
$comments = [System.Management.Automation.PSParser]::Tokenize($code,[ref]$null) |
    Where Type -eq 'Comment' | Select -Expand Content
$regex = ( $comments |% { [regex]::Escape($_) } ) -join '|'

# Output to remove all empty lines
$code -replace $regex -split '\r?\n' -notmatch '^\s*$'

# Output that Removes only Beginning and Ending Blank Lines
($code -replace $regex).Trim()

赞(0）回复(0）举报 2022-12-14

0ve6wy6x2#

执行与示例相反的操作：仅发出不匹配的行：

## Output to console
Get-Content .\file.ps1 | Where-Object { $_ -notmatch '#' }

## Output to file
Get-Content .\file.ps1 | Where-Object { $_ -notmatch '#' } | Out-file .\newfile.ps1 -Append

赞(0）回复(0）举报 2022-12-14

ni65a41a3#

基于@AdminOfThings有用的答案，使用Abstract Syntax Tree (AST) Class解析器方法，但避免使用任何正则表达式：

$Code = $Code.ToString() # Prepare any ScriptBlock for the substring method
$Tokens = [System.Management.Automation.PSParser]::Tokenize($Code, [ref]$null)
-Join $Tokens.Where{ $_.Type -ne 'Comment' }.ForEach{ $Code.Substring($_.Start, $_.Length) }

赞(0）回复(0）举报 2022-12-14

dly7yett4#

至于*附带 * 的问题，输出文件的大小大约是输入文件的两倍：

正如AdminOfThings所指出的，Windows PowerShell 中的Out-File默认为UTF-16 LE（“Unicode”）编码，其中字符表示为（至少）* 两 * 个字节，而ANSI编码（默认情况下由Windows PowerShell中的Set-Content使用）对所有（支持的）字符。同样，UTF-8编码文件仅使用 * 一个 * 字节的ASCII范围内的字符（请注意，PowerShell（Core）7+ 现在一致默认为（无BOM）UTF-8）。根据需要使用-Encoding参数。
基于 regex 的问题解决方案 * 永远不会完全健壮*，即使您尝试将注解删除限制为单行注解。
要获得完整的健壮性，您必须使用PowerShell's language parser，如其他答案中所述。

但是，在删除注解后重新构建原始源代码时必须小心：

AdminOfThings's answer存在删除 * 太多 * 的风险，因为随后使用-replace进行基于全局 regex 的处理：虽然这种情况不太可能发生，但如果注解在字符串 * 中重复，它也会被错误地从字符串中删除。
iRon's answer的语法错误风险是 * 不加空格 * 连接标记，这样. .\foo.ps1就会变成..\foo.ps1，例如。盲目地在标记之间加空格 * 不是 * 一个选项，因为属性访问语法会被破坏（例如$host.Name会变成$host . Name，但是值和.运算符之间不允许有空格）

下面的解决方案避免了这些问题，同时尽可能地保留原始代码的格式，但这有局限性，因为解析器不报告行内空白：

这意味着您无法分辨给定行上标记之间的空白是由制表符、空格还是两者的混合组成。根据需要进行调整。
为了在一定程度上弥补移除占据自己行的注解，将两个以上的连续空白或空行折叠成一个空行。可以将空白/空行一起移除，但这可能会损害可读性。

# Tokenize the file content.
# Note that tabs, if any, are replaced by 2 spaces first; adjust as needed.
$tokens = $null
$null = [System.Management.Automation.Language.Parser]::ParseInput(    
  ((Get-Content -Raw .\myscript.ps1) -replace '\t', '  '), 
  [ref] $tokens,
  [ref] $null
)  

# Loop over all tokens while omitting comments, and rebuild the source code 
# without them, trying to preserve the original formatting as much as possible.
$sb = [System.Text.StringBuilder]::new() 
$prevExtent = $null; $numConsecNewlines = 0
$tokens.
  Where({ $_.Kind -ne 'Comment' }).
  ForEach({ 
    $startColumn = if ($_.Extent.StartLineNumber -eq $prevExtent.StartLineNumber) { $prevExtent.EndColumnNumber }
                   else { 1 }
    if ($_.Kind -eq 'NewLine') {
      # Fold multiple blank or empty lines into a single empty one.
      if (++$numConsecNewlines -ge 3) { return }
    } else {
      $numConsecNewlines = 0
      $null = $sb.Append(' ' * ($_.Extent.StartColumnNumber - $startColumn))
    }
    $null = $sb.Append($_.Text)
    $prevExtent = $_.Extent
  })

# Output the result.
# Pipe to Set-Content as needed.
$sb.ToString()

赞(0）回复(0）举报 2022-12-14

我来回答

regex PowerShell从脚本中删除所有注解

4条答案

相关问题

热门标签

最新问答