为什么我在抓取网站时得到一个空列表，并以输入错误结束？

tktrz96b 于 2022-09-18 发布在 Java

关注(0)|答案(2)|浏览(165)

您好，我正尝试在pyhton中进行Web抓取，但它返回空的string。我不能解决这个问题，因为我是初学者。据我所知，我得到了一个

TypeError：列表索引必须是整数或切片，不能是字符串

当我这么做的时候，你能帮忙吗？

当我使用此形状时，它返回空数据

import requests
from bs4 import  BeautifulSoup
import pandas as pd
from urllib import response

kitapurl = "https://1000kitap.com/alintilar"

response = requests.get(kitapurl)
soup = BeautifulSoup(response.content,"html.parser")
gelen_ana_veri = soup.find_all('span',attrs={'class':'text-alt'})
print(gelen_ana_veri)

但如果我这样做，我会得到打字错误

import requests
from bs4 import  BeautifulSoup
import pandas as pd
from urllib import response

kitapurl = "https://1000kitap.com/alintilar"

response = requests.get(kitapurl)
soup = BeautifulSoup(response.content,"html.parser")
gelen_ana_veri= soup.find_all('meta',attrs={'property':'og:description'})['content']
print(gelen_ana_veri)

我得到的错误

Traceback (most recent call last):
  File "c:UsersfratkOneDriveBelgelertwbot1001kitap.py", line 11, in <module>
    elem = soup.find_all('meta',attrs={'property':'og:description'})['content']
TypeError: list indices must be integers or slices, not str

python

来源：https://stackoverflow.com/questions/73760092/why-do-i-get-an-empty-list-while-scraping-website-and-end-in-type-error

2条答案

按热度按时间

jchrr9hc1#

始终并首先检查对您的请求或/和您的soup的响应。

请求是否成功？
**是否包含预期要素？
是否有信息表明内容被扣留？
..

网站通过CloudFlare进行保护

需要注意！|Cloudflare请启用Cookie。对不起，你已经被屏蔽了你无法访问1000kitap.com为什么我被屏蔽了？该网站正在使用安全服务来保护自己免受在线攻击。您刚才执行的操作触发了安全解决方案...

所以这个页面不想被刮掉，你应该尊重这一点-从技术Angular 来看，仍然可以选择访问页面及其内容，例如使用cloudscraper。

无论如何，您的find_all()的ResultSet不能被视为dict，解决方案是迭代或在本例中简单地使用find()，因为只有一个描述性元-如果属性不可用，也使用get()以避免错误：

soup.find('meta',attrs={'property':'og:description'}).get('content')

示例

import cloudscraper
from bs4 import  BeautifulSoup

kitapurl = "https://1000kitap.com/alintilar"
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})

response = scraper.get(url).content
soup = BeautifulSoup(response,"html.parser")
gelen_ana_veri= soup.find('meta',attrs={'property':'og:description'}).get('content')
print(gelen_ana_veri)

赞(0）回复(0）举报 2022-09-18

6ju8rftf2#

That is because content doesn't exist.

If you try elem = soup.find_all('meta',attrs={'property':'og:description'}) you will get nothing or an empty list []

You would also get the TypeError: list indices must be integers or slices, not str if you were to try gelen_ana_veri = soup.find_all('span',attrs={'class':'text-alt'})['content']

first check to see if the element you are looking for exists with soup.find_all() function in a more broad sense. Then if you find your element, then try to drill down more of its classes, or properties.

Here is a basic example of trying to find what you are looking for:

gelen_ana_veri = soup.find_all('meta')
print(gelen_ana_veri)

This will output a list of every meta tag and then you can see of any specific tags you are looking for.

赞(0）回复(0）举报 2022-09-18

我来回答

为什么我在抓取网站时得到一个空列表，并以输入错误结束？

2条答案

示例

相关问题

热门标签

最新问答