我需要搜索https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip网址从https://www.windwardstudios.com/version/version-downloads使用powershell。
所以我需要https:\\<anything>\JavaRESTfulEngine<anything>.zip
首先,我尝试了$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip'
,它可以工作并给我想要的URL
为了进一步推广,我尝试了$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
,但现在它不工作。
下面是我的powershell脚本。
# URL of the website to scrape
$websiteUrl = https://www.windwardstudios.com/version/version-downloads
# Use Invoke-WebRequest to fetch the web page content
$response = Invoke-WebRequest -Uri $websiteUrl
# Check if the request was successful
if ($response.StatusCode -eq 200) {
# Parse the HTML content to find the zip file URL using a regular expression
$htmlContent = $response.Content
$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
$zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }
if ($zipFileUrls.Count -gt 0) {
Write-Host "Found zip file URLs:"
$zipFileUrls | ForEach-Object { Write-Host $_ }
} else {
Write-Host "Zip file URLs not found on the page."
}
} else {
Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"
}
输出量:
Zip file URLs not found on the page.
所需输出:
https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip
你能建议一下吗?
1条答案
按热度按时间p8ekf7hl1#
您可以使用
参见regex demo。
https://cdn\.windwardstudios\.com/Archive/
-文字https://cdn.windwardstudios.com/Archive/
字符串(\S+?)
-第1组:一个或多个非空白字符尽可能少/JavaRESTfulEngine-
-文字/JavaRESTfulEngine-
字符串.*?
-除换行符字符以外的任何零个或多个字符尽可能少\.zip
-.zip
字符串。