为什么我在抓取网站时得到一个空列表,并以输入错误结束?

tktrz96b  于 2022-09-18  发布在  Java
关注(0)|答案(2)|浏览(165)

您好,我正尝试在pyhton中进行Web抓取,但它返回空的string。我不能解决这个问题,因为我是初学者。据我所知,我得到了一个

TypeError:列表索引必须是整数或切片,不能是字符串

当我这么做的时候,你能帮忙吗?

当我使用此形状时,它返回空数据

import requests
from bs4 import  BeautifulSoup
import pandas as pd
from urllib import response

kitapurl = "https://1000kitap.com/alintilar"

response = requests.get(kitapurl)
soup = BeautifulSoup(response.content,"html.parser")
gelen_ana_veri = soup.find_all('span',attrs={'class':'text-alt'})
print(gelen_ana_veri)

但如果我这样做,我会得到打字错误

import requests
from bs4 import  BeautifulSoup
import pandas as pd
from urllib import response

kitapurl = "https://1000kitap.com/alintilar"

response = requests.get(kitapurl)
soup = BeautifulSoup(response.content,"html.parser")
gelen_ana_veri= soup.find_all('meta',attrs={'property':'og:description'})['content']
print(gelen_ana_veri)

我得到的错误

Traceback (most recent call last):
  File "c:UsersfratkOneDriveBelgelertwbot1001kitap.py", line 11, in <module>
    elem = soup.find_all('meta',attrs={'property':'og:description'})['content']
TypeError: list indices must be integers or slices, not str
jchrr9hc

jchrr9hc1#

始终并首先检查对您的请求或/和您的soup的响应。

  • 请求是否成功?
    **是否包含预期要素?
  • 是否有信息表明内容被扣留?
  • ..

网站通过CloudFlare进行保护

需要注意!|Cloudflare请启用Cookie。对不起,你已经被屏蔽了你无法访问1000kitap.com为什么我被屏蔽了?该网站正在使用安全服务来保护自己免受在线攻击。您刚才执行的操作触发了安全解决方案...

所以这个页面不想被刮掉,你应该尊重这一点-从技术Angular 来看,仍然可以选择访问页面及其内容,例如使用cloudscraper

无论如何,您的find_all()ResultSet不能被视为dict,解决方案是迭代或在本例中简单地使用find(),因为只有一个描述性元-如果属性不可用,也使用get()以避免错误:

soup.find('meta',attrs={'property':'og:description'}).get('content')

示例

import cloudscraper
from bs4 import  BeautifulSoup

kitapurl = "https://1000kitap.com/alintilar"
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})

response = scraper.get(url).content
soup = BeautifulSoup(response,"html.parser")
gelen_ana_veri= soup.find('meta',attrs={'property':'og:description'}).get('content')
print(gelen_ana_veri)
6ju8rftf

6ju8rftf2#

That is because content doesn't exist.

If you try elem = soup.find_all('meta',attrs={'property':'og:description'}) you will get nothing or an empty list []

You would also get the TypeError: list indices must be integers or slices, not str if you were to try gelen_ana_veri = soup.find_all('span',attrs={'class':'text-alt'})['content']

first check to see if the element you are looking for exists with soup.find_all() function in a more broad sense. Then if you find your element, then try to drill down more of its classes, or properties.

Here is a basic example of trying to find what you are looking for:

gelen_ana_veri = soup.find_all('meta')
print(gelen_ana_veri)

This will output a list of every meta tag and then you can see of any specific tags you are looking for.

相关问题