我想过滤是否已经有标题URL存在于mongo数据库中
如果是,覆盖到mongo db请指导我如何过滤scrappy和mongo之间的titleURL?
items.py:
from scrapy.contrib.djangoitem import DjangoItem
from mongo_test.models import Ct
class CtItem(DjangoItem):
django_model = Ct
蒙戈models.py:
class Ct(models.Model):
title = models.CharField(max_length=100)
titleURL = models.URLField(max_length=255)
.....
pipeline.py :
from mongo_test.models import Ct
class CtPipeline(object):
def process_item(self, item, spider):
ct = item.save(commit=False)
ct_exist = Ct.objects.filter() #how to let scrapy titleURL= mongo titleURL
if ct_exist:
# override to mongo
ct.save()
return item
settings.py 在django项目中:
DATABASES = {
'default': {
'ENGINE': 'django_mongodb_engine',
'NAME': 'scrapy',
}
}
1条答案
按热度按时间ivqmmu1c1#
不保存以前通过擦除脚本存储的重复数据