我是playwright_scrappy的新手。我想在django模型中保存scrappy的产量数据。这里是我的pilpeline.py
import scrapy
from scrapy.exceptions import DropItem
from scrapy.pipelines.images import ImagesPipeline
from itemadapter import ItemAdapter
from .enums import ModelChoices
from tp_core.models import (
ScrapPatent,
Logo,
Patent,
Trademark,
Name, PatentThreat
)
model = 'Patent'
class ScrapyappPipeline:
def process_item(self, item, spider):
if model == ModelChoices.PATENT.value:
quote = PatentThreat(patent_name=item.get('title'), description=item.get('description'),file= item.get('image'), URL = item.get('url'),contact_details=item.get('data'))
quote.save()
return item
if model == ModelChoices.TRADEMARK.value:
quote = Trademark(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_trademark = item.get('url'))
quote.save()
return item
if model == ModelChoices.LOGO.value:
quote = Logo(title=item.get('title'), description=item.get('description'),logo_file= item.get('image'), url_logo = item.get('url'))
quote.save()
return item
if model == ModelChoices.NAME.value:
quote = Name(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_name = item.get('url'))
quote.save()
return item
我还在www.example.com上添加了这些行settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
我还尝试使用quote = Trademark(name=item.get('title'), description=item.get('description'),file= item.get('image'), url_trademark = item.get('url')) quote.save() return item
它运行良好,但不节省价值的django模型,请任何人都可以帮助我
2条答案
按热度按时间mwg9r5ms1#
看起来你试图从异步上下文将数据保存到Django模型,这是不允许的。Django的数据库层是同步的,需要同步上下文才能正常工作。
要解决这个问题,你可以使用asgiref中的sync_to_async函数将同步Django ORM调用转换为异步调用。下面是一个如何使用sync_to_async的例子:
在本例中,我们使用sync_to_async创建保存方法的同步版本,然后使用await异步调用它,这将允许您从异步Scrapy管道中将数据保存到Django模型中。
kmynzznz2#
您需要在设置文件中启用管道。
您可以在这里阅读更多详细信息