我试图从不同的网站拉某些值。我有一个表中的所有xpath,我希望能够从给定的网站拉xpath值。我附上了我想要的图片。Table w/ Link and Xpath - Red is the output
| 网站|XPath|网站记录|
| --|--|--|
| https://cbcmustangs.com/sports/mbkb/2023-24/schedule的|/html/body/main/div/div[11/div[3]/div[1]/div[2]|二比零|
| https://columbiacougars.com/sports/mens-basketballischedule/2023-24 | /html/body/form/main/article/div[31/div/div[2]/ul/li[1]/span[2]| 0到0|
| https://mbuspartans.com/sports/mens-basketball/schedule/2023-24 | /html/body/form/main/article/div[31/div/div(2)/ul/li[1]/span[2]| 0到0|
| https://wwuowls.comSports/mens-basketball/schedule/2023-24 | /html/body/form/main/article/div [2]/ul/li[1]/span| 1比0|
| https://wbueagles.com/sports/mbkb/2023-24/schedule | /html/body/main/div/div[1]/div[2]/div[1]/div[2]| 0到0|
我是用google sheets做的,但是有太多的sheets,很坚韧加载。我希望能在google sheets中做同样的事情。
下面是我在spreadsheetguru上找到的关于如何从页面中提取标题的方法,但是我想用xpaths来做。我用google sheets做过,但是有太多的表格,很坚韧加载。我想在google中也能做同样的事情。
下面是我在spreadsheetguru上找到的关于如何从页面中提取标题的东西,但是我想用xpath来做。
Function GetWebpageTitle(cell As Range)
'PURPOSE: Excel Function to pull-in Webpage Title
'SOURCE: www.thespreadsheetguru.com
Dim URL As String
Dim Domain As String
Dim ReceivedHTML As String
Dim TitleTag_Start As Long
Dim TitleTag_End As Long
Dim Title As String
Dim objHttp As Object
'Domain Address to Prefix URL input (optional)
Domain = "https://www.TheSpreadsheetGuru.com"
'Full URL Path
URL = Domain & cell.Value
'Create HTTP Call
On Error GoTo InvalidDomain
Set objHttp = CreateObject("MSXML2.ServerXMLHTTP")
objHttp.Open "GET", URL, False
objHttp.Send ""
DoEvents
On Error GoTo 0
'Store URL Page HTML Code
ReceivedHTML = objHttp.ResponseText
'Search for Title Tags
TitleTag_Start = InStr(1, UCase(ReceivedHTML), "")
'Isolate Text Within Title Tags (if applicable)
If TitleTag_Start > 0 And TitleTag_End > 0 Then
TitleTag_Start = TitleTag_Start + Len("<TITLE>")
Title = Mid(ReceivedHTML, TitleTag_Start, TitleTag_End - TitleTag_Start)
End If
'Title Text Cleanup
Title = Replace(Title, "&", "&")
Title = Replace(Title, "'", "'")
Title = Replace(Title, """, Chr(34))
'Output Result
GetWebpageTitle = Title
'Reset Object Variable(s)
Set objHttp = Nothing
Exit Function
'ERROR MESSAGES
InvalidDomain:
'Output Error Value
GetWebpageTitle = "Invalid Domain or URL Address"
Reset Object Variable(s)
Set objHttp = Nothing
End Function
字符串
1条答案
按热度按时间xhv8bpkk1#
这是在结果网站上使用Selenium的最简单方法。使用Selenium进行网页抓取。当我开发此代码时,网站在一天中的某些时间超时。从这里下载Selenium edgedriver-https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/在Excel VBE中设置工具>引用> Selenium类型库
字符串