我试图抓取daraz.pk,但遇到了这个错误。蜘蛛抓取页面上的所有值,直到最后一个值,因为它返回None值,然后蜘蛛抛出一个不可迭代的NoneType对象。我尝试使用异常处理方法,但无论如何都不起作用。如果有人能帮助我,我在这里分享我的代码。我'我使用selenium和scrapy一起来获得物品页面上物品的描述
**
import scrapy
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from ..items import EcomItem
class DarazSpider(scrapy.Spider):
name = 'daraz'
def start_requests(self):
path = 'C:\Program Files (x86)\chromedriver.exe'
driver = Chrome(executable_path=path)
driver.get('https://www.daraz.pk/')
electronics = driver.find_element(By.NAME, 'q')
electronics.send_keys('Books')
electronics.send_keys(Keys.RETURN)
link_elements = driver.find_elements(By.XPATH,'/html/body/div[3]/div/div[2]/div/div/div/div[2]/div/div/div/div[2]/div[2]/a[text()]')
for link_el in link_elements:
href = link_el.text
print(href)
def parse(self, response):
pass
**
这里是错误
**
Traceback (most recent call last):
d = crawler.crawl(*args,**kwargs)
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1905, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1815, in _cancellableInlineCallbacks
_inlineCallbacks(None, gen, status)
--- <exception caught here> ---
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
start_requests = iter(self.spider.start_requests())
builtins.TypeError: 'NoneType' object is not iterable
2022-08-06 10:29:20 [twisted] CRITICAL:
Traceback (most recent call last):
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable
**
1条答案
按热度按时间w46czmvw1#
您可以从
API
获取所需的数据。由于数据是由JAvaScript通过GET
方法的API动态加载的,并且数据是json格式的。这是获取数据的超级简单和健壮的方法。范例:
输出:
...等等