regex 在PowerShell中,正则表达式匹配的内容超出了应有的范围

qcbq4gxm  于 2023-03-20  发布在  Shell
关注(0)|答案(2)|浏览(143)

嗨,我有一个脚本,试图取代一个块<p></p>与文本image002.jpg或png。但它总是取代更多的,它应该。有人能帮助改善正则表达式。我问了ChatGPT,但没有运气:=)。谁得到的答案必须比人工智能聪明。

# Assign the input text to a variable using single quotes
$content = 
@'
<p class=xmsonormal style="background:white">to be kept 1</p>

<p class=xmsonormal style="background:white"><span style="font-family:"Source Sans Pro",sans-serif;
color:black"><!--[if gte vml 1]><v:shape id="Picture_x0020_2" o:spid="_x0000_i1028"
 type="#_x0000_t75" alt="" style="width:75pt;height:75pt">
 <v:imagedata src="test_files/image002.png" o:href="cid:98eba38d-7414-46ec-a87a-8eca235a6e5c"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=100 height=100
src="test_files/image002.png" style="height:1.041in;width:1.041in" v:shapes="Picture_x0020_2"><![endif]></span><span
style="color:black"><o:p></o:p></span></p>

<p class=xmsonormal style="background:white">to be kept 2</p>
'@

$pattern = "(?s)<p\b[^>]*>.*?\bimage002\.(png|jpg)\b.*?</p>"
#"<p[^>]*>(.*?)(image002\.(png|jpg))(.*?)<\/p>"
$replacement = "<p>replaced</p>"

[regex]::Replace($content, $pattern, $replacement, [System.Text.RegularExpressions.RegexOptions]::Singleline)
# Use -replace operator to find and replace all matches
$content = $content -replace $pattern,$replacement
# Write the modified content to the console or a file
Write-Host $content

我希望保持原样的<p>元素包含上面的to be kept
使用我当前的方法,第一个<p>元素也被替换了,这是我不希望的。

yzxexxkh

yzxexxkh1#

  • 您需要确保在要匹配的image002.png和后续的image002.jpg匹配之间没有 * 额外的 * <p>
  • 您可以使用以下涉及负预视Assert的子表达式来确保:(?:.(?!<p\b[^>]*>))+
# Note: (?:…) is a *non*-capturing group.
#       (?!…) is a negative lookahead assertion.
$pattern = '(?s)<p\b[^>]*>(?:.(?!<p\b[^>]*>))+\bimage002\.(?:png|jpg)\b.*?</p>'
$replacement = '<p>replaced</p>'
$content -replace $pattern, $replacement

$content # Output the result

有关正则表达式的解释和使用它进行实验的能力,请参见this regex101.com page
输出:

<p class=xmsonormal style="background:white">to be kept 1</p>

<p>replaced</p>

<p class=xmsonormal style="background:white">to be kept 2</p>
s4n0splo

s4n0splo2#

根据我的评论。
假设您在HTML段落标记之间找到了所有文本,那么它就像这样:

(?<=>).*(?=<\/p>)

所以,如果你在你发布的文本中使用上面的模式,你会得到这样的结果:

Clear-Host
[RegEX]::Matches('<p class=xmsonormal style="background:white">to be kept 1</p>
    
<p class=xmsonormal style="background:white"><span style="font-family:"Source Sans Pro",sans-serif;
color:black"><!--[if gte vml 1]><v:shape id="Picture_x0020_2" o:spid="_x0000_i1028"
    type="#_x0000_t75" alt="" style="width:75pt;height:75pt">
    <v:imagedata src="test_files/image002.png" o:href="cid:98eba38d-7414-46ec-a87a-8eca235a6e5c"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=100 height=100
src="test_files/image002.png" style="height:1.041in;width:1.041in" v:shapes="Picture_x0020_2"><![endif]></span><span
style="color:black"><o:p></o:p></span></p>
    
<p class=xmsonormal style="background:white">to be kept 2</p>
', '(?<=>).*(?=<\/p>)').Value
# Results
<#
to be kept 1
<o:p></o:p></span>
    
to be kept 2
#>

当然,根据您真正追求的目标,您还需要考虑更多清理工作,您尚未向我们展示之前和之后的目标
如果你只看这张图片|png only字符串,其中只应替换名称,而不替换扩展名:

"test_files/image002.png"

那么同样的方法也适用,例如:

(?<=src="test_files.).*(?=\(.jpg|.png)

所以...

Clear-Host
[RegEX]::Matches('<p class=xmsonormal style="background:white">to be kept 1</p>
    
<p class=xmsonormal style="background:white"><span style="font-family:"Source Sans Pro",sans-serif;
color:black"><!--[if gte vml 1]><v:shape id="Picture_x0020_2" o:spid="_x0000_i1028"
    type="#_x0000_t75" alt="" style="width:75pt;height:75pt">
    <v:imagedata src="test_files/image002.png" o:href="cid:98eba38d-7414-46ec-a87a-8eca235a6e5c"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=100 height=100
src="test_files/image002.png" style="height:1.041in;width:1.041in" v:shapes="Picture_x0020_2"><![endif]></span><span
style="color:black"><o:p></o:p></span></p>
    
<p class=xmsonormal style="background:white">to be kept 2</p>
', '(?<=src="test_files.).*(?=\(.jpg|.png)').Value
# Results
<#
image002
image002
#>

当然,在这一点上,更换是一件简单的事情。

Clear-Host
[regex]::replace('
<p class=xmsonormal style="background:white">to be kept 1</p>
    
<p class=xmsonormal style="background:white"><span style="font-family:"Source Sans Pro",sans-serif;
color:black"><!--[if gte vml 1]><v:shape id="Picture_x0020_2" o:spid="_x0000_i1028"
    type="#_x0000_t75" alt="" style="width:75pt;height:75pt">
    <v:imagedata src="test_files/image002.png" o:href="cid:98eba38d-7414-46ec-a87a-8eca235a6e5c"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=100 height=100
src="test_files/image002.png" style="height:1.041in;width:1.041in" v:shapes="Picture_x0020_2"><![endif]></span><span
style="color:black"><o:p></o:p></span></p>
    
<p class=xmsonormal style="background:white">to be kept 2</p>
', 
'(?<=src="test_files.).*(?=\(.jpg|.png)', 'ReplacemenImage')
# Results
<#
<p class=xmsonormal style="background:white">to be kept 1</p>
    
<p class=xmsonormal style="background:white"><span style="font-family:"Source Sans Pro",sans-serif;
color:black"><!--[if gte vml 1]><v:shape id="Picture_x0020_2" o:spid="_x0000_i1028"
    type="#_x0000_t75" alt="" style="width:75pt;height:75pt">
    <v:imagedata src="test_files/ReplacemenImage.png" o:href="cid:98eba38d-7414-46ec-a87a-8eca235a6e5c"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=100 height=100
src="test_files/ReplacemenImage.png" style="height:1.041in;width:1.041in" v:shapes="Picture_x0020_2"><![endif]></span><span
style="color:black"><o:p></o:p></span></p>
    
<p class=xmsonormal style="background:white">to be kept 2</p>

相关问题