我正在创建一个网络爬虫,从一个网站捕获数据,然后将其插入我的数据库。我正在使用scrapy和mysql。我创建了以下代码:
管道.py:
class MySQLStorePipeline(object):
def __init__(self):
self.conn = MySQLdb.connect(host ='localhost', user ='root', passwd ='', db ='imoveis', charset="utf8", use_unicode=True)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
try:
self.cursor.execute("""INSERT INTO imovel (Titulo, Tipo_Negocio, Preco, Localizacao, Tipo_Imovel, Condicao, Numero_Divisoes, Numero_Quartos, Numero_Casas_Banho, Certificado_Energetico, Ano_Construcao, Area_Util, Area_Bruta, Piso)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
(item['Titulo'],
item['Tipo_Negocio'],
item['Preco'],
item['Localizacao'],
item['Tipo_Imovel'],
item['Condicao'],
item['Numero_Divisoes'],
item['Numero_Quartos'],
item['Numero_Casas_Banho'],
item['Certificado_Energetico'],
item['Ano_Construcao'],
item['Area_Util'],
item['Area_Bruta'],
item['Piso']))
self.conn.commit()
except MySQLdb.Error as e:
print('Error %d: %s' % (e.args[0], e.args[1]))
sys.exit(1)
return item
设置.py:
BOT_NAME = 'novo'
SPIDER_MODULES = ['novo.spiders']
NEWSPIDER_MODULE = 'novo.spiders'
FEED_EXPORT_ENCODING = 'utf-8'
ITEM_PIPELINES = {
'novo.pipelines.MySQLStorePipeline' : 300
}
ROBOTSTXT_OBEY = True
爬虫.py:
class SapoSpider(scrapy.Spider):
name = "imoveis"
allowed_domains = ["maisconsultores.pt"]
start_urls = ["https://www.maisconsultores.pt/properties?page=%d&s=eedce" % i for i in range(23)]
def parse(self,response):
subpage_links = []
for i in response.css('div.item.col-sm-4'):
youritem = {
'Titulo':i.css('div[class=image] h3::text').extract(),
'Tipo_Negocio':i.css('div.price::text').re('[^\t\n\r\a]+'),
}
subpage_link = i.css('div[class=image] a::attr(href)').extract_first()
full_url = response.urljoin(subpage_link)
yield scrapy.Request(full_url, callback=self.parse_subpage, meta={'item':youritem})
def parse_subpage(self,response):
youritem = response.meta.get('item')
youritem['Tipo_Imovel'] = response.xpath('//ul[@class="amenities"]//li[1]/text()').extract()
youritem['Condicao'] = response.xpath('//ul[@class="amenities"]//li[2]/text()').extract()
yield youritem
运行scrapy时出现的错误如下:
_mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)')
我真的不知道或看不到我在这里错过了什么。如果你们能帮我,我会非常感激的。
暂无答案!
目前还没有任何答案,快来回答吧!