java 如何使用htmlunit在谷歌上获取“下一页”

i7uq4tfw  于 2023-05-12  发布在  Java
关注(0)|答案(1)|浏览(105)

我使用下面的代码来获取谷歌搜索结果的前两页,但我只能获取第一页(当搜索第2页时,它与第1页相同)

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

/**
 * A simple Google search test using HtmlUnit.
 *
 * @author Rahul Poonekar
 * @since Apr 18, 2010
 */
public class Author_search {
    static final WebClient browser;

    static {
        browser = new WebClient();
        browser.setJavaScriptEnabled(false);
    }

    public static void main(String[] arguments) {
            searchTest();
    }

    private static void searchTest() {
        HtmlPage currentPage = null;

        try {
            currentPage = (HtmlPage) browser.getPage("http://www.google.com");
        } catch (Exception e) {
            System.out.println("Could not open browser window");
            e.printStackTrace();
        }
        System.out.println("Simulated browser opened.");

        try {
            ((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("xxoo");
            currentPage = currentPage.getElementByName("btnG").click();
            System.out.println("contents: " + currentPage.asText());
            HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(), 'Next')]").get(0);
            currentPage = next.click();
            System.out.println("contents: " + currentPage.asText());
        } catch (Exception e) {
            System.out.println("Could not search");
            e.printStackTrace();
        }
    } 
}

有谁能告诉我怎么修吗?
顺便说一句:
1.如何使用htmlunit在google中更改语言设置?有什么方便的方法吗?

  1. htmlunit是像firefox中的“firebug”一样对待html,还是像“file->保存”中的文本一样对待它。在我看来,我相信它像一个浏览器一样对待它,我说的对吗?
flseospp

flseospp1#

我替换了:

HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(),'Next')]").get(0);
currentPage = next.click();

与:

HtmlAnchor nextAnchor =currentPage.getAnchorByText("Next");
currentPage = nextAnchor.click();

相关问题