powershell 如何使用文本文件中包含的链接进行网页抓取？

bxjv4tth 于 12个月前发布在 Shell

关注(0)|答案(1)|浏览(192)

我目前正在尝试创建一个powershell脚本，从网站中提取CVE编号。网站链接在文本文件中指定，看起来像这样：

Application   : Microsoft.Office.Interop.Excel.ApplicationClass
Creator       : 1480803660     Parent        : System.__ComObject
Name          : https://www.cisa.gov/uscert/ics/advisories/icsa-22-006-01
Range         : System.__ComObject
Shape         : 
SubAddress    : 
Address       : https://www.cisa.gov/uscert/ics/advisories/icsa-22-006-01

字符串
我目前的代码遇到了一个关于“空值表达式”的错误，我似乎无法让代码工作。我怀疑这可能是我试图读取文本文件的方式有问题。

$Path = "C:\Users\Windows\Downloads\Links.txt"
$values = Get-Content $Path | Where-Object {$_ -like '*Name*'}
$URI = $values

ForEach ($URI in $Path){
$HTML = Invoke-WebRequest -Uri $URI -UseBasicParsing
($HTML.ParsedHtml.getElementsByTagName("a") | Where{ $_.href -eq 'http://web.nvd.nist.gov/view/vuln/detail?vulnId' } ).innerText | Out-File -FilePath 'C:\Users\Windows\Downloads\CVEList'
}

型

powershell

来源：https://stackoverflow.com/questions/72719751/how-do-i-web-scrape-using-links-contained-in-a-text-file

1条答案

按热度按时间

ccgok5k51#

为了补充@Sage的评论，使用Microsoft Windows中的HTMLDocument Class为较新的PowerShell版本提供了可能的解决方案：

function ParseHtml($String) {
    $Unicode = [System.Text.Encoding]::Unicode.GetBytes($String)
    $Html = New-Object -Com 'HTMLFile'
    if ($Html.PSObject.Methods.Name -Contains 'IHTMLDocument2_Write') {
        $Html.IHTMLDocument2_Write($Unicode)
    } 
    else {
        $Html.write($Unicode)
    }
    $Html.Close()
    $Html
}

$Uri = 'https://stackoverflow.com/a/72720158/1701026'
$Html = ParseHtml (Invoke-WebRequest -Uri $URI).Content
$Html.body.getElementsByTagName('a') |ForEach-Object { $_.href }

字符串

赞(0）回复(0）举报 12个月前

我来回答

powershell 如何使用文本文件中包含的链接进行网页抓取？

1条答案

相关问题

热门标签

最新问答