我正在尝试用scrapy做一个网页抓取,但是当我试图从href中获取文本时,它显示为“none”,有人能帮助我吗?
我需要得到 “sinonimos”班的学生
页面、值 我想得到的是这些:
image
代码:
import scrapy
class SinonimoSpider(scrapy.Spider):
name = 'sinonimo'
start_urls = ['https://www.sinonimos.com.br/pedido/']
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.2840.71 Safari/539.36'
def parse(self, response):
for filmes in response.css('.sinonimo'):
yield{
'sinonimo': filmes.css('.sinonimo a::text').get()
}
结果:
.........
........
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
{'sinonimo': None}
2022-08-04 00:23:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.sinonimos.com.br/pedido/>
........
........
2条答案
按热度按时间2exbekwf1#
当你写
for filmes in response.css('.sinonimo'):
的时候,你已经在为“.sinonimo”做循环了。它失败了,因为你试图再做一次,它还包括了所有包含你需要的单词的链接标记。仅仅一个::text
就足以得到所有的单词。输出量:
ev7lccsx2#
您的代码实际上非常接近。唯一需要更改的是使用
a.sinonimo::text
而不是.sinonimo a::text
。您还可以做的是,由于类
sinonimo
和标签a
组合专门属于您要抓取的单词,因此您可以使用yield from
表达式来简化它。例如
输出