我尝试了一些网站报废。我成功地在我当前的数据库表中报废了数据。但我想插入到“new_table”只有当记录不存在于“当前表”
我的代码是(管道)
table = 'products'
table2 = 'new_products'`
def save(self, row):
cursor = self.cnx.cursor()
cursor.execute("SELECT DISTINCT product_id FROM products;")
old_ids = [row[0] for row in cursor.fetchall()]
create_query = ("INSERT INTO " + self.table +
"(rowid, date, listing_id, product_id, product_name, price, url) "
"VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
cursor.execute(create_query, row)
lastRecordId = cursor.lastrowid
self.cnx.commit()
cursor.close()
print("Item saved with ID: {}" . format(lastRecordId))
if not product_id in old_ids:
create_query = ("INSERT INTO " + self.table2 +
"(rowid, date, listing_id, product_id, product_name, price, url) "
"VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
它不起作用,并且出现错误。
2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507',
'listing_id': '0190199600119',
'price': '4199.00',
'product_id': '1209298',
'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah',
'rowid': 456274953331128512,
'url': 'https://www.mediamarkt.com.tr/tr/product/APPLE%2011%22%20Magic%20T%C3%BCrk%C3%A7e%20Q%20Klavye%20Siyah-1209298.html'}
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args,**kw)
File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f
return deferred_from_coro(coro_f(*coro_args,**coro_kwargs))
File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item
self.save(dict(item))
File "/root/teknosa/teknosa/pipelines.py", line 62, in save
if not product_id in old_ids:
NameError: name 'product_id' is not defined
Saving item into db ...
我有唯一的product_id。
如果当前表中没有产品标识,则将此产品标识插入“新产品”
这怎么搞的?
- 谢谢-谢谢
上次编辑:我得到这个错误。
2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622',
'listing_id': '8713439219357',
'price': '99.00',
'product_id': '1175529',
'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu',
'rowid': -411152717288573423,
'url': 'https://www.mediamarkt.com.tr/tr/product/TRUST%2021935%20NANGA%20USB%203.1%20Kart%20Okuyucu-1175529.html'}
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 523, in cmd_query
self._cmysql.query(query,
_mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args,**kw)
File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f
return deferred_from_coro(coro_f(*coro_args,**coro_kwargs))
File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item
self.save(dict(item))
File "/root/teknosa/teknosa/pipelines.py", line 69, in save
cursor.execute(create_query, row)
File "/usr/local/lib/python3.8/dist-packages/mysql/connector/cursor_cext.py", line 269, in execute
result = self._cnx.cmd_query(stmt, raw=self._raw,
File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 528, in cmd_query
raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'
1条答案
按热度按时间nwwlzxa71#
如果你想只在不存在的情况下插入,你不需要做你正在做的事情,也不需要选择所有的然后再看你正在找的是否存在。
您需要的是为table2中的produc_id创建一个唯一索引
然后将代码更改为:
如果使用ON DUPLICATE KEY,则当系统找到重复的行(已存在的product_id)时,系统会尝试将product_id更新为相同的product_id,因此它不会生效。
如果设置autocommit= True,则可以删除这些提交。
编辑
如果像您在注解中所说的那样,只有当新表不存在于您的表中时才需要在新表中插入,则可以如下更改代码:
您需要更改行 old_ids = [row[0] for row in cursor.fetchall()] 中变量的名称,因为您正在更改
row
参数的值2.您的问题出在if语句中,product_id变量不存在,您需要更改它