excel 使用Google Translate提取div的内容

v1uwarro  于 2023-11-20  发布在  Go
关注(0)|答案(3)|浏览(152)

我有一个功能来翻译语言:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, Trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        Translate = CleanA(Trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function

字符串
中文翻译:

A1 = Hello

B1 = Translate(A1,"en","zh-cn")


结果是“Nho”,正确的结果是“你好”
友情链接:https://translate.google.com/m?hl=en&sl=en&tl=zh-CN&ie=UTF-8&prev=_m&q=hello
我想要结果:

B1 = 你好

C1 = Nǐ hǎo


我想我需要修改这个代码:

Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")


请帮帮我,谢谢!

z0qdvdin

z0qdvdin1#

页面返回两个<div>,如下所示:

<div dir="ltr" class="o1">Nǐ hǎo</div>
<div dir="ltr" class="t0">你好</div>

字符串
建议您不要尝试使用正则表达式解析HTML,因为解析HTML很容易遇到困难-您可以使用Microsoft HTML对象库从Excel中获得类似的结果。
要获取这两个<div>标签的内容,您可以按照下面的示例使用此代码:

' o1 has Anglicised translation, t0 as tranlsation in target language
Set objDivs = objHTML.getElementsByTagName("div")
For Each objDiv In objDivs
    If objDiv.className = "o1" Then
        strTranslatedO1 = objDiv.innerText
    End If
    If objDiv.className = "t0" Then
        strTranslatedT0 = objDiv.innerText
    End If
Next objDiv


这基本上是在返回的HTML中遍历所有<div>标签,并检查类名o1t0,然后获取innerText属性。使用这种技术,您可以获得翻译后的值并将其写回工作表,例如:
x1c 0d1x的数据
完整代码:

Option Explicit

Public Sub Test()

    Dim ws As Worksheet
    
    ' testing worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    ws.Cells.Delete
    
    ' test inputs
    ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1")
    ws.Range("A2:A4") = "hello"
    ws.Range("B2:B4") = "English"
    ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian"))
    
    ' test
    ws.Range("D2") = Translate("hello", "en", "zh-cn", True)
    ws.Range("E2") = Translate("hello", "en", "zh-cn", False)
    ws.Range("D3") = Translate("hello", "en", "es", True)
    ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet
    ws.Range("D4") = Translate("hello", "en", "ru", True)
    ws.Range("E4") = Translate("hello", "en", "ru", False)

End Sub

Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String
    
    Dim strURL As String
    Dim objHTTP As Object
    Dim objHTML As Object
    Dim objDivs As Object, objDiv
    Dim strTranslatedT0 As String
    Dim strTranslatedO1 As String
   
    ' send query to web page
    strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _
        "&sl=" & strFromLanguageCode & _
        "&tl=" & strToLanguageCode & _
        "&ie=UTF-8&prev=_m&q=" & strInput

    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    objHTTP.Open "GET", strURL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ""
    
    ' create a html document
    Set objHTML = CreateObject("htmlfile")
    With objHTML
        .Open
        .Write objHTTP.responseText
        .Close
    End With
    
    ' o1 has Anglicised translation, t0 as tranlsation in target language
    Set objDivs = objHTML.getElementsByTagName("div")
    For Each objDiv In objDivs
        If objDiv.className = "o1" Then
            strTranslatedO1 = objDiv.innerText
        End If
        If objDiv.className = "t0" Then
            strTranslatedT0 = objDiv.innerText
        End If
    Next objDiv
    
    ' choose which to return
    If blnTargetAlphabet Then
        Translate = strTranslatedT0
    Else
        Translate = strTranslatedO1
    End If
    
CleanUp:
    Set objHTML = Nothing
    Set objHTTP = Nothing
    
End Function

2020年12月更新

看来这种方法将不再工作,也许要追溯到11月中旬。
看看这些React

  • div类名已更改为更模糊的名称
  • 有一些深奥的c-wiz元素做一些奇妙的.
  • 另外,我怀疑某些客户端脚本在检索文档后调用实际翻译

选项:Selenium、Microsoft Translate、Google翻译API的免费和付费层;)

i7uq4tfw

i7uq4tfw2#

@Robin麦肯齐解决方案可与以下修复程序配合使用:
Youtube Automate Language Translations Using Excel VBA by Dinesh Kumar Takyar类似

  1. "t0"已更改。
    If objDiv.className = "t0" Then替换为
If objDiv.className = "result-container" Then

字符串
1.对于编码问题(例如,希伯来语),请在开头添加(Office 2013及更高版本):
strInput = WorksheetFunction.EncodeURL(strInput)
注意事项:不要看谷歌翻译普通网页的HTML,“/m?”代表“移动的”使用谷歌翻译移动网页,有一个不同的和更简单的HTML代码。

ubbxdtey

ubbxdtey3#

大家好
我的主题是旧的,但通过学习JavaScript编程的过程,我取得了一些进展,并编写了一个基于Google翻译的翻译插件。所以现在我分享我的应用程序。
插件目标:

1.异步HTTP转换。
1.翻译充分利用谷歌翻译的5,000翻译字符在一个请求。
1.使用UDF转换函数转换Excel单元格和单元格范围。
1.使用键盘快捷键进行翻译。
我在Github上分享了这个项目:TranslateXL

相关问题