提取谷歌的第一页的所有链接的任何关键字在Excel表中列出

v8wbuo2f 于 2023-10-22 发布在其他

关注(0)|答案(1)|浏览(100)

我试图获取所有的href链接后，谷歌网页加载的特定关键字。

Option Explicit

Private chr As Selenium.ChromeDriver

Sub Test()
Dim i As Long
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).row
For i = 2 To lastrow
    Dim mykeyword As String
    mykeyword = Sheet1.Cells(i, 1).Value
    Set chr = New Selenium.ChromeDriver
    chr.Start
    chr.Get "https://www.google.com/search?q=" & mykeyword
    chr.Wait 1000

    Dim Mylinks As Selenium.WebElements
    Dim Mylink As Selenium.WebElement

    Set Mylinks = chr.FindElementsByCss("div.yuRUbf.a")

    For Each Mylink In Mylinks
        If LCase(Mylink.Attribute("data-ved")) = "2ahUKEwjSuvfI1MP9AhWu-TgGHRNrAB4QFnoECAkQAQ" Then
            Debug.Print Mylink.Attribute("href")
            Exit For
        End If
    Next Mylink
    If i = lastrow Then
        chr.Quit
    End If
Next i
End Sub

宏也从Excel工作表中获取所有关键字，但它没有按预期获取href链接。
我试图运行这个关键字只是一个示例“608- 2 Z网站：fagbearing.cc和文件类型：pdf”，并打开了链接也就是说。

"https://www.google.com/search?q=608-2Z+site%3Afagbearing.cc+AND+filetype%3Apdf&rlz=1C1JJTC_enIN1044IN1044&oq=608-2Z+site%3Afagbearing.cc+AND+filetype%3Apdf&aqs=chrome.0.69i59.735j0j7&sourceid=chrome&ie=UTF-8"

在那之后，它需要获取这两个在加载后产生的链接，但它没有。

excel

来源：https://stackoverflow.com/questions/75639935/extract-googles-first-pages-all-href-links-of-any-keywords-listed-in-excel-she

1条答案

按热度按时间

fsi0uk1n1#

这个代码的工作，因为它去...

Option Explicit

Sub Test()
    Dim i As Long
    Dim lastrow As Long
    Dim chr As ChromeDriver
    Set chr = New ChromeDriver
    Dim sURL As String
    Dim mykeyword As String
    Dim MyLinks As WebElements
    Dim MyLink As WebElement
    Dim sFilename As String
    
    sFilename = "C:\Users\<username>\Downloads\log-" & Format(Now(), "yyyymmddHHMMSS") & ".txt"
    Open sFilename For Output As #1
    Print #1, "START"
    Print #1, Now()
    lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
    Set chr = New Selenium.ChromeDriver
    Call chr.Start("edge")
    chr.Get ("https://www.google.com")
    chr.FindElementById("L2AGLb").Click ' L2AGLb Accept all
    chr.Window.Maximize
    chr.Wait 1000
    For i = 2 To lastrow
     mykeyword = Sheet1.Cells(i, 1).Value
     chr.Get ("https://www.google.com/search?q=" & mykeyword)
     Set MyLinks = chr.FindElementsByTag("a")
     chr.Wait 1000
     Print #1, "Start"
     Print #1, mykeyword
     Print #1, MyLinks.Count
     For Each MyLink In MyLinks
 '      If LCase(MyLink.Attribute("data-ved")) = "2ahUKEwjSuvfI1MP9AhWu-TgGHRNrAB4QFnoECAkQAQ" Then
              Print #1, "href = " & MyLink.Attribute("href")
              Print #1, "data-ved = " & MyLink.Attribute("data-ved")
 '        Exit For
 '      End If
      Next MyLink
      chr.Wait 1000
      'If i = lastrow Then
      ' chr.Quit
      'End If
     Next i
     Print #1, "End ***"
     Close #1
End Sub

在使用xml.Print时出现了一个奇怪的问题，所以我改为使用文本文件作为输出。
在搜索关键字之前，我必须先点击接受cookie横幅。
然后，代码会找到大量带有href属性的“a”标记，但并非所有标记都具有data-ved属性。
MyLink.Attribute（“data-ved”）=“2ahUKEwjSuvfI 1 MP 9AhWu-TgGHRNrAB 4 QFnoECAkQAQ”在我看到的页面中从未找到。
我用关键词“a”，“B”和“c”来搜索。
我使用了Microsoft支持的Java类型库
https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
我使用的是MS Edge -Selenium版本必须与您安装的Edge版本相匹配。
Edge版本110.0.1587.57
您将需要浏览到您的机器的类型库，无论是32位还是64位。
工具>引用> Selenium类型库（Selenium64.tlb）

赞(0）回复(0）举报 2023-10-22

我来回答

提取谷歌的第一页的所有链接的任何关键字在Excel表中列出

1条答案

相关问题

热门标签

最新问答