excel 从表中提取网站中的Xpath数据

cyvaqqii  于 2023-11-20  发布在  其他
关注(0)|答案(1)|浏览(144)

我试图从不同的网站拉某些值。我有一个表中的所有xpath,我希望能够从给定的网站拉xpath值。我附上了我想要的图片。Table w/ Link and Xpath - Red is the output
| 网站|XPath|网站记录|
| --|--|--|
| https://cbcmustangs.com/sports/mbkb/2023-24/schedule的|/html/body/main/div/div[11/div[3]/div[1]/div[2]|二比零|
| https://columbiacougars.com/sports/mens-basketballischedule/2023-24 | /html/body/form/main/article/div[31/div/div[2]/ul/li[1]/span[2]| 0到0|
| https://mbuspartans.com/sports/mens-basketball/schedule/2023-24 | /html/body/form/main/article/div[31/div/div(2)/ul/li[1]/span[2]| 0到0|
| https://wwuowls.comSports/mens-basketball/schedule/2023-24 | /html/body/form/main/article/div [2]/ul/li[1]/span| 1比0|
| https://wbueagles.com/sports/mbkb/2023-24/schedule | /html/body/main/div/div[1]/div[2]/div[1]/div[2]| 0到0|
我是用google sheets做的,但是有太多的sheets,很坚韧加载。我希望能在google sheets中做同样的事情。
下面是我在spreadsheetguru上找到的关于如何从页面中提取标题的方法,但是我想用xpaths来做。我用google sheets做过,但是有太多的表格,很坚韧加载。我想在google中也能做同样的事情。
下面是我在spreadsheetguru上找到的关于如何从页面中提取标题的东西,但是我想用xpath来做。

Function GetWebpageTitle(cell As Range)
'PURPOSE: Excel Function to pull-in Webpage Title

'SOURCE: www.thespreadsheetguru.com

Dim URL As String
Dim Domain As String
Dim ReceivedHTML As String
Dim TitleTag_Start As Long
Dim TitleTag_End As Long
Dim Title As String
Dim objHttp As Object

'Domain Address to Prefix URL input (optional)
  Domain = "https://www.TheSpreadsheetGuru.com"

'Full URL Path
  URL = Domain & cell.Value

'Create HTTP Call
  On Error GoTo InvalidDomain
    Set objHttp = CreateObject("MSXML2.ServerXMLHTTP")
    objHttp.Open "GET", URL, False
    objHttp.Send ""
    DoEvents
  On Error GoTo 0

'Store URL Page HTML Code
  ReceivedHTML = objHttp.ResponseText

'Search for Title Tags
  TitleTag_Start = InStr(1, UCase(ReceivedHTML), "")

'Isolate Text Within Title Tags (if applicable)
  If TitleTag_Start > 0 And TitleTag_End > 0 Then
    TitleTag_Start = TitleTag_Start + Len("<TITLE>")
    Title = Mid(ReceivedHTML, TitleTag_Start, TitleTag_End - TitleTag_Start)
  End If

'Title Text Cleanup
  Title = Replace(Title, "&amp;", "&")
  Title = Replace(Title, "&#39;", "'")
  Title = Replace(Title, "&quot;", Chr(34))

'Output Result
  GetWebpageTitle = Title

'Reset Object Variable(s)
  Set objHttp = Nothing

Exit Function

'ERROR MESSAGES
InvalidDomain:
  'Output Error Value
    GetWebpageTitle = "Invalid Domain or URL Address"
  
  Reset Object Variable(s)
    Set objHttp = Nothing

End Function

字符串

xhv8bpkk

xhv8bpkk1#

这是在结果网站上使用Selenium的最简单方法。使用Selenium进行网页抓取。当我开发此代码时,网站在一天中的某些时间超时。从这里下载Selenium edgedriver-https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/在Excel VBE中设置工具>引用> Selenium类型库

Option Explicit
    
    Sub sbCBCMustangs2()
        Dim chr As Selenium.ChromeDriver
        Set chr = New Selenium.ChromeDriver
        Dim sURL As String
        sURL = Sheet2.Range("A2").Value
        Dim sXPath As String
        sXPath = Sheet2.Range("B2").Value
        Sheet2.Range("C2").Value = "Not found"
        chr.Start ("edge")
        chr.Get (sURL)
        chr.Window.Maximize
        chr.Wait 10000
        Sheet2.Range("C2").Value = "xA " & chr.FindElementByXPath(sXPath).Text 
' force text value or Excel will interpret 2-1 as 01-Feb...
    End Sub

字符串

相关问题