使用VBS - Chrome - CMD和正则表达式下载HTML页面源代码

s8vozzvw  于 2023-09-28  发布在  Go
关注(0)|答案(1)|浏览(68)

我试图使用VBS和Chrome下载HTML源代码,将其保存到本地磁盘,并使用正则表达式提取两个单词之间的文本(startString - endString)。这就是我所拥有的:

'run cmd command
Set oShell = WScript.CreateObject("WScript.Shell")
oShell.Run "cmd c: & cd Program Files\Google\Chrome\Application>chrome.exe --headless --dump-dom --enable-logging --disable-gpu https://google.com >C:\temp\source.txt"
'read txt
Dim objFile, fso
Set fso = CreateObject("Scripting.FileSystemObject")
Set objFile = fso.OpenTextFile("C:\temp\source.txt", ForReading)
'RegEx 
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "(^.*;startString=)(.*)(;endString.*)"
objRegExp.IgnoreCase = True
objRegExp.Global = True 
Do Until objFile.AtEndOfStream 
 strSearchString = objFile.ReadLine
 Dim objMatches
 Set objMatches = objRegExp.Execute(strSearchString)
 If objMatches.Count > 0 Then
  out = out & objMatches(0) &vbCrLf
  WScript.Echo "found"
 End If
Loop
WScript.Echo out
objFile.Close

问题1:我在使用CMD和VBS时遇到问题,如果我打开控制台,请导航到C:Chrome.exe命令运行正常。Issue 2:输出回波始终为空

qkf9rpyu

qkf9rpyu1#

代码可以通过使用InStr而不是RegExp来简化。此外,如注解中所示,原始代码中的Run命令缺少/c,不必要地更改了目录,并且缺少bWaitOnReturn。还要注意,如果要显示的字符串超过64K,WScript.Echo将无法显示任何内容。MsgBox将始终显示字符串的前1023个字符。下面是为使用InStr而重写的代码:

Set oWSH = WScript.CreateObject("WScript.Shell")
oWSH.Run "Cmd.exe /c ""C:\Program Files\Google\Chrome\Application\chrome.exe"" --headless --dump-dom --enable-logging --disable-gpu https://google.com >C:\temp\source.txt",,True
Set oFSO = CreateObject("Scripting.FileSystemObject")
Contents = oFSO.OpenTextFile("C:\temp\source.txt").ReadAll
StartString = "https://store.google.com"
EndString = "https://mail.google.com"
StartPos = InStr(Contents,StartString)
FoundText = ""
If StartPos>0 Then
  EndPos = InStr(StartPos,Contents,EndString)
  If EndPos > StartPos Then FoundText = Mid(Contents,StartPos,EndPos-StartPos+Len(EndString))
End If
WScript.Echo FoundText

相关问题