如何将返回的Scrapy数据分配给变量?

zrfyljdw  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(170)
import scrapy
from scrapy.crawler import CrawlerRunner

class Livescores2(scrapy.Spider):

    name = 'Home'

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')

    def parse(self, response):

        for total in response.css('td'):
            yield{
                'total': total.css('::text').get()               
                }

runner2 = CrawlerRunner()
runner2.crawl(Livescores2)

当我像下面这样调整设置时,我可以将数据保存为json,没有问题。

runner2 = CrawlerRunner(settings = {
    "FEEDS": {
    "Home.json": {"format": "json", "overwrite": True},
    },
    })

我想把返回的Scrapy数据赋给一个变量,这样我就可以处理它了。我不需要任何Json数据!
我试探着:

import scrapy
from scrapy.crawler import CrawlerRunner

class Livescores2(scrapy.Spider):

    name = 'Home'

    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')

    def parse(self, response):

        for total in response.css('td'):
            yield{
                'total': total.css('::text').get()               
                }

runner2 = CrawlerRunner()
a = runner2.crawl(Livescores2) 

print(a)

结果为:〈在0x 65 cbfb 6d 0处延迟〉
如何从变量中获取数据?我开发了一个Android应用程序,所以我不需要任何Json文件。我不知道如何在此代码中使用“return”函数。
多谢了

mzaanser

mzaanser1#

您可以简单地创建一个class属性来存储data,然后在spider处理完所有请求后访问它。不过,这并不是scrapy框架真正针对的工作流,可能还有其他web抓取工具可以更直观地处理这个问题。
例如:

import scrapy
from scrapy.crawler import CrawlerRunner

class Livescores2(scrapy.Spider):
    name = 'Home'
    data = []   # data attribute
    def start_requests(self):
        yield scrapy.Request('https://www.livescores.com/football/turkey/super-lig/?tz=3&table=league-home')

    def parse(self, response):
        for total in response.css('td'):
            item = {'total': total.css('::text').get()}
            self.data.append(item)  # append item to data list
            yield item

runner2 = CrawlerRunner()
a = runner2.crawl(Livescores2) 

print(Livescores2.data)  # print the collected data

相关问题