python 为什么我的Beautiful Soup循环在使用'_ngcontent'抓取内容时没有返回任何内容?

iswrvxsc  于 2023-06-04  发布在  Python
关注(0)|答案(1)|浏览(134)

如何删除包含在“_ngcontent”前端中的文本?
下面是代码:

from bs4 import BeautifulSoup as bs
import requests

url = 'https://formosodoaraguaia.megasofttransparencia.com.br/receitas-e-despesas/empenho?faseDoEmpenho=4&etapaDaDespesa=4&dataInicial=01%2F01%2F2019'
page_to_scrap = requests.get(url)
soup = bs (page_to_scrap.text, 'html.parser')

data = soup.findAll("label _ngcontent-lpf-c7", attrs={"class":"valor"})

for i in data:
  print (data.text)

循环没有返回任何内容,就好像我选择的选择器中没有内容一样。

这是否与弹出页面中的内容有关?我怎么能把这样的东西扔掉呢?
谢谢大家!
更新-----------------------------------------------------------
当我重新加载页面时,“_ngcontent”ID会更改。这就是它现在的样子:

kse8i1jr

kse8i1jr1#

如果打开页面源代码,您将只看到以下内容:

<!doctype html>
<html lang="en">

<head>
    <meta charset="utf-8">
    <title>Portal Transparencia</title>
    <base href="/">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <link rel="icon" type="image/x-icon" href="./assets/favicon/favicon.ico">
    <link rel="icon" type="image/png" sizes="16x16" href="./assets/favicon/favicon-16x16.png">
    <link rel="icon" type="image/png" sizes="32x32" href="./assets/favicon/favicon-32x32.png">
    <link rel="icon" type="image/png" sizes="96x96" href="./assets/favicon/favicon-96x96.png">
    <link rel="apple-touch-icon" sizes="57x57" href="./assets/favicon/apple-touch-icon-57x57.png">
    <link rel="apple-touch-icon" sizes="60x60" href="./assets/favicon/apple-touch-icon-60x60.png">
    <link rel="apple-touch-icon" sizes="72x72" href="./assets/favicon/apple-touch-icon-72x72.png">
    <link rel="apple-touch-icon" sizes="76x76" href="./assets/favicon/apple-touch-icon-76x76.png">
    <link rel="apple-touch-icon" sizes="114x114" href="./assets/favicon/apple-touch-icon-114x114.png">
    <link rel="apple-touch-icon" sizes="120x120" href="./assets/favicon/apple-touch-icon-120x120.png">
    <link rel="apple-touch-icon" sizes="144x144" href="./assets/favicon/apple-touch-icon-144x144.png">
    <link rel="apple-touch-icon" sizes="152x152" href="./assets/favicon/apple-touch-icon-152x152.png">
    <link rel="apple-touch-icon" sizes="180x180" href="./assets/favicon/apple-touch-icon-180x180.png">

    <link rel="stylesheet" href="styles.f7136d10368295392983.css">
</head>

<body>
    <mega-root></mega-root>
    <script src="runtime-es2015.b45dbb8b28e64d8d5234.js" type="module"></script>
    <script src="runtime-es5.b45dbb8b28e64d8d5234.js" nomodule defer></script>
    <script src="polyfills-es5.06416a6230f9503a933a.js" nomodule defer></script>
    <script src="polyfills-es2015.fb891162ab77d6e3ed02.js" type="module"></script>
    <script src="main-es2015.a978096e6e6cd72c87b2.js" type="module"></script>
    <script src="main-es5.a978096e6e6cd72c87b2.js" nomodule defer></script>
</body>

</html>

没有div,没有span,没有类。这就是requests的全部功能。这是因为所有内容都是通过JavaScript模块动态加载的。
要获得动态内容,您需要一个使用headless浏览器(没有GUI的浏览器)的库。用于此目的的最流行的,也支持Python的是SeleniumPyppeteerPlaywright

相关问题