我想像这样循环scrapy.Spider
for i in range(0, 10):
class MySpider(scrapy.Spider, ABC):
start_urls = ["example.com"]
def start_requests(self):
for url in self.urls:
if dec == i:
yield SplashRequest(url=url, callback=self.parse_data, args={"wait": 1.5})
def parse_data(self, response):
data= response.css("td.right.data").extract()
items["Data"] = data
yield items
settings = get_project_settings()
settings["FEED_URI"] = f"/../Data/data_{i}.json"
if __name__ == "__main__":
process = CrawlerProcess(settings)
process.crawl(MySpider)
process.start()
然而,这产生了
twisted.internet.error.ReactorNotRestartable
使用
process.start(stop_after_crawl=False)
为i=0
执行脚本,但在i=1
挂起
1条答案
按热度按时间fgw7neuy1#
您可以使用多重处理或LoopingCall。
您可以在twisted reactor文档中阅读有关
scheduling tasks for the future
的信息。