我试图刮javascript网站使用scrapy-playwright
,但他们显示Crawled 0 pages
是否有任何错误,我做了代码,为什么他们没有爬取任何数据,这些是页面链接https://www.coursera.org/search?query=python&utm_source=gg&utm_medium=sem&utm_campaign=B2C_INDIA__branded_FTCOF_courseraplus_arte_monthly&utm_content=B2C&campaignid=18216928761&adgroupid=141296026472&device=c&keyword=coursera%20online&matchtype=b&network=g&devicemodel=&adpostion=&creativeid=619458216863&gclid=CjwKCAiAkfucBhBBEiwAFjbkr5EhIFModjG1bK9jcqv126-AOgp4M-DzZCXXwLJyy_e16UZkmoUuxRoC_IcQAvD_BwE
import scrapy
from scrapy.http import Request
from scrapy_playwright.page import PageMethod
class TestSpider(scrapy.Spider):
name = 'sample'
def start_requests(self):
yield scrapy.Request(
url="https://www.coursera.org/search?query=python&utm_source=gg&utm_medium=sem&utm_campaign=B2C_INDIA__branded_FTCOF_courseraplus_arte_monthly&utm_content=B2C&campaignid=18216928761&adgroupid=141296026472&device=c&keyword=coursera%20online&matchtype=b&network=g&devicemodel=&adpostion=&creativeid=619458216863&gclid=CjwKCAiAkfucBhBBEiwAFjbkr5EhIFModjG1bK9jcqv126-AOgp4M-DzZCXXwLJyy_e16UZkmoUuxRoC_IcQAvD_BwE",
callback=self.parse,
meta={
"playwright": True,
"playwright_page_methods": [
PageMethod("wait_for_selector", "ul.cds-71"),
],
},
)
def parse(self, response):
yield{
'text':response.text
}
1条答案
按热度按时间zazmityj1#
如果你使用的是windows,你不能直接使用playwright。要使用它,你必须在你的windows上设置WSL来运行它。你可以检查这个
https://github.com/scrapy-plugins/scrapy-playwright/issues/7
要了解如何使用WSL启动浏览器,https://github.com/scrapy-plugins/scrapy-playwright/issues/78