我正在尝试重构我的代码和打破的东西来学习,我打破了一些东西,希望你能帮助我学习。
我得到了一个运行在多个页面的工作刮刀如下:
class someSpider(scrapy.Spider):
name = 'spider_name'
allowed_domains = ['www.example.com']
start_urls = ['https://www.example.com&page=1']
def parse(self, response):
result_parsed = json.loads(result)
results = result_parsed.get('results') #yield actual results
current_page_number = result_parsed.get('currentPage') #gets the page from the link as part of the API response
for result in results:
count += 1
yield{
... #gives me the results as desired
}
go_to_nextpage(self, current_page_number) #### THIS DOES NOT WORK, not error, just stops at one page ####
#### THIS WORKS ####
# next_page_number = result_parsed.get('currentPage') +1
# yield scrapy.Request(
# url=f'https://www.immoweb.be/en/search-results/house-and-apartment/for-sale/brussels/district?countries=BE&hasRecommendationActivated=true&page={next_page_number}&orderBy=relevance&searchType=similar',
# callback=self.parse
# )
将next_page_number()定义为:
def go_to_nextpage(self, current_page_number):
next_page_number = current_page_number +1
yield scrapy.Request(
url=f'https://www.example.com&page={next_page_number}',
callback=self.parse
)
我想我不太明白两件事:
- self关键字的作用
- 回调方法和解析方法的工作/交互方式
任何帮助都是感激不尽的
1条答案
按热度按时间mec1mxoz1#
有几个问题,我希望可以帮助澄清。
self
参数。myclass.method()
;myclass
是类别执行严修的变数。self
变量(作为第一个参数自动注入):self.method()
.self.go_to_nextpage(current_page_number)
go_to_nextpage
方法生成的项,因为当前代码不对返回值执行任何操作。go_to_nextpage
中生成了一个结果,它会自动将该方法转换为生成器下面是一个示例,它应该是这样的:
如果你想在你的
go_to_nextpage
方法中使用yield,你可以这样写。