在django模型中保存playwright-scrapy数据

chhkpiq4  于 2023-03-08  发布在  Go
关注(0)|答案(2)|浏览(164)

我是playwright_scrappy的新手。我想在django模型中保存scrappy的产量数据。这里是我的pilpeline.py

import scrapy
from scrapy.exceptions import DropItem
from scrapy.pipelines.images import ImagesPipeline
from itemadapter import ItemAdapter
from .enums import ModelChoices
from tp_core.models import (
    ScrapPatent,
    Logo, 
    Patent, 
    Trademark, 
    Name, PatentThreat
    )

model = 'Patent'
class ScrapyappPipeline:
    def process_item(self, item, spider):
        if model == ModelChoices.PATENT.value:
            quote = PatentThreat(patent_name=item.get('title'), description=item.get('description'),file= item.get('image'), URL = item.get('url'),contact_details=item.get('data'))
            quote.save()
            return item
        if model == ModelChoices.TRADEMARK.value:
            quote = Trademark(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_trademark = item.get('url'))
            quote.save()
            return item
        if model == ModelChoices.LOGO.value:
            quote = Logo(title=item.get('title'), description=item.get('description'),logo_file= item.get('image'), url_logo = item.get('url'))
            quote.save()
            return item
        if model == ModelChoices.NAME.value:
            quote = Name(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_name = item.get('url'))
            quote.save()
            return item

我还在www.example.com上添加了这些行settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

我还尝试使用quote = Trademark(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_trademark = item.get('url')) quote.save() return item
它运行良好,但不节省价值的django模型,请任何人都可以帮助我

mwg9r5ms

mwg9r5ms1#

看起来你试图从异步上下文将数据保存到Django模型,这是不允许的。Django的数据库层是同步的,需要同步上下文才能正常工作。
要解决这个问题,你可以使用asgiref中的sync_to_async函数将同步Django ORM调用转换为异步调用。下面是一个如何使用sync_to_async的例子:

from asgiref.sync import sync_to_async
from myapp.models import MyModel

async def process_item(self, item, spider):
    # create a synchronous version of the save method
    save_sync = sync_to_async(MyModel().save)
    
    # call the synchronous save method asynchronously
    await save_sync()

在本例中,我们使用sync_to_async创建保存方法的同步版本,然后使用await异步调用它,这将允许您从异步Scrapy管道中将数据保存到Django模型中。

kmynzznz

kmynzznz2#

您需要在设置文件中启用管道。

ITEM_PIPELINES = {
    'yourproject.pipelines.ScrapyappPipeline': 800,
}

您可以在这里阅读更多详细信息

相关问题