javascript Browser.close()函数无法关闭puppeteer中的浏览器

hpxqektj  于 2023-01-29  发布在  Java
关注(0)|答案(2)|浏览(378)

我正在运行一个程序,从谷歌搜索的任何歌曲的任何艺术家使用puppeteer图书馆刮网站的网址。我递归提取的网址从谷歌搜索上的所有网页。一切工作正常,但当我试图关闭浏览器使用浏览器。关闭()函数在我手动关闭浏览器后才起作用。当我手动关闭浏览器时,我的数据会记录在终端上,否则不会。我已经等了近15分钟,但浏览器仍然保持打开状态。以下是我的代码

const puppeteer = require("puppeteer");

(async () => {
  const getData = async (url, start = 0) => {
    try {
      const page = await browser.newPage();
      await page.setViewport({ width: 1366, height: 768 });

      const query = `${url}&start=${start}`;
      await page.goto(query, { waitUntil: "load", timeout: 0 });

      await page.waitForSelector('div[class="yuRUbf"] >a', { timeout: 0 });
      const links = await page.evaluate(() =>
        Array.from(document.querySelectorAll('div[class="yuRUbf"] >a')).map(
          (a) => a.href
        )
      );
      await page.close();
      if (links.length < 1) {
        // return if no link exists
        return links;
      } else {
        return links.concat(await getData(url, (start = start + 10)));
      }
    } catch (error) {
      if (error) console.log(error);
    }
  }; //end get data function

  const browser = await puppeteer.launch({ headless: false });

  const url =
    "https://www.google.com/search?q=Let+You+Love+Me+by+Rita+Ora&sxsrf=ALeKk02Hp5Segi8ShvyrREw3NLZ6p7_BKw:1622526254457&ei=Lsm1YPSzG9WX1fAPvdqTgAg&sa=N&ved=2ahUKEwj0gqSo3fXwAhXVSxUIHT3tBIAQ8tMDegQIARA7&biw=1517&bih=694";

  const allLinks = await getData(url);
  await browser.close();
  console.log(allLinks);
})(); //end musicCrawler function

// getData Function
nnt7mjpx

nnt7mjpx1#

原因是这一行:

await page.waitForSelector('div[class="yuRUbf"] >a', { timeout: 0 });

当你看到最后一个没有链接的页面时,这一行将无限期等待。尝试使用内部的try-catch块设置更大的超时:

try {
        await page.waitForSelector('div[class="yuRUbf"] > a', { timeout: 60_000 });
      } catch (error) {
        console.log(error);
      }
ldioqlga

ldioqlga2#

如果你不用puppeteer也能完成这项任务,我建议你不要使用任何浏览器自动化来抓取谷歌搜索页面,相反,你可以从一个简单的请求中得到你需要的东西,这需要更少的资源来完成这项任务。
例如,您可以使用axios发出请求,并使用cheerio使用jQuery语法解析HTML。请在联机IDE中查看如何执行此操作:

const cheerio = require("cheerio");
const axios = require("axios");

const searchString = "Let You Love Me by Rita Ora";

const AXIOS_OPTIONS = {
  headers: {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
  },
  params: { q: `${searchString}`, hl: "en", gl: "us", start: 0 },
};

async function getLinks() {
  return axios.get(`http://www.google.com/search`, AXIOS_OPTIONS).then(async function ({ data }) {
    let $ = cheerio.load(data);

    const links = Array.from($(".yuRUbf > a")).map((el) => $(el).attr("href"));

    if (links.length < 1) {
      // return if no link exists
      return links;
    } else {
      AXIOS_OPTIONS.params.start += 10;
      return links.concat(await getLinks());
    }
  });
}

getLinks().then(console.log);

输出:

[
   "https://en.wikipedia.org/wiki/Let_You_Love_Me",
   "https://open.spotify.com/album/4ymdSiRu5oOyfrJschvFLO",
   "https://storyofsong.com/story/let-you-love-me/",
   "https://www.smule.com/song/rita-ora-let-you-love-me-karaoke-lyrics/7796351_7796351/arrangement",
   "https://secondhandsongs.com/performance/805308",
   "https://www.pinterest.com/pin/373939575307014693/",
   "https://www.discogs.com/master/1440716-Rita-Ora-Let-You-Love-Me",
   "https://www.azlyrics.com/lyrics/ritaora/letyouloveme.html",
   "https://soundcloud.com/ritaora/let-you-love-me",
   "https://www.songexplained.com/rita-ora-let-you-love-me-lyrics-meaning-explained/",
   "https://www.imdb.com/title/tt9033998/fullcredits",
  ... and all other links
]

或者,您可以使用SerpApi的Google Organic Results API,其主要优点是您不需要从头开始编写解析器并不断地维护它(毕竟,Google经常更改页面上元素的结构,您需要不断地寻找必要的选择器)。
下面是一个适合您的用途的使用示例(在线IDE中的代码):

import { config, getJson } from "serpapi";

config.api_key = process.env.API_KEY; //your API key from serpapi.com

const engine = "google"; // search engine
const params = {
  q: "Let You Love Me by Rita Ora", // Parameter defines the query you want to search
  gl: "us", // Parameter defines the country to use for the Google search
  hl: "en", // Parameter defines the language to use for the Google search
  start: 0, // Parameter defines the result offset
};

const getResults = async () => {
  const { organic_results } = await getJson(engine, params);
  if (!organic_results) return;
  else {
    const links = organic_results.map((el) => el.link);
    params.start += 10;
    return links.concat(await getResults());
  }
};

getResults().then(console.log);

输出:

[
   "https://www.youtube.com/watch?v=qAuHpmM6_2c",
   "https://www.youtube.com/watch?v=Mi6I5fWQbXE",
   "https://www.youtube.com/watch?v=uB3ZBhDM5j8",
   "https://en.wikipedia.org/wiki/Let_You_Love_Me",
   "https://open.spotify.com/album/4ymdSiRu5oOyfrJschvFLO",
   "https://storyofsong.com/story/let-you-love-me/",
   "https://www.facebook.com/RitaOra/videos/let-you-love-me/1205535526588451/",
   "https://www.pinterest.com/pin/373939575307014693/",
   "https://www.smule.com/song/rita-ora-let-you-love-me-karaoke-lyrics/7796351_7796351/arrangement",
   "https://secondhandsongs.com/performance/805308",
   "https://celebmix.com/rita-ora-drops-new-single-let-you-love-me/",
   "https://www.discogs.com/master/1440716-Rita-Ora-Let-You-Love-Me",
   "https://soundcloud.com/ritaora/let-you-love-me",
   ... and all other links
]

相关问题