regex 使用powershell在网页内容中查找url

niknxzdl  于 12个月前  发布在  Shell
关注(0)|答案(1)|浏览(113)

我需要搜索https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip网址从https://www.windwardstudios.com/version/version-downloads使用powershell。
所以我需要https:\\<anything>\JavaRESTfulEngine<anything>.zip
首先,我尝试了$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip',它可以工作并给我想要的URL
为了进一步推广,我尝试了$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip',但现在它不工作。
下面是我的powershell脚本。

# URL of the website to scrape

$websiteUrl = https://www.windwardstudios.com/version/version-downloads

# Use Invoke-WebRequest to fetch the web page content

$response = Invoke-WebRequest -Uri $websiteUrl

# Check if the request was successful

if ($response.StatusCode -eq 200) {

    # Parse the HTML content to find the zip file URL using a regular expression

    $htmlContent = $response.Content

    $regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'

    $zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }

    if ($zipFileUrls.Count -gt 0) {

        Write-Host "Found zip file URLs:"

        $zipFileUrls | ForEach-Object { Write-Host $_ }

    } else {

        Write-Host "Zip file URLs not found on the page."

    }

} else {

    Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"

}

输出量:

Zip file URLs not found on the page.

所需输出:

https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip

你能建议一下吗?

p8ekf7hl

p8ekf7hl1#

您可以使用

https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip

参见regex demo

  • 详情 *:
  • https://cdn\.windwardstudios\.com/Archive/-文字https://cdn.windwardstudios.com/Archive/字符串
  • (\S+?)-第1组:一个或多个非空白字符尽可能少
  • /JavaRESTfulEngine--文字/JavaRESTfulEngine-字符串
  • .*?-除换行符字符以外的任何零个或多个字符尽可能少
  • \.zip-.zip字符串。

相关问题