sqlite Python Peewee EXISTS子查询未按预期工作

qnzebej0  于 2023-02-16  发布在  SQLite
关注(0)|答案(1)|浏览(141)

我在python应用程序中使用peewee ORM,我正在尝试编写代码从SQLite数据库中批量获取记录,我有一个子查询,它似乎可以自己工作,但当添加到更新查询中时,fn.EXISTS(sub_query)似乎没有任何作用,因为数据库中的每条记录都被更新了。
注意:我使用的是peewee的APSW扩展。

def batch_logic(self, id_1, path_1, batch_size=1000, **kwargs):
    sub_query = (self.select(ModelClass.granule_id).distinct().where(
        (ModelClass.status == 'old_status') &
        (ModelClass.collection_id == collection_id) &
        (ModelClass.name.contains(provider_path))
    ).order_by(ModelClass.discovered_date.asc()).limit(batch_size)).limit(batch_size))
    print(f'len(sub_query): {len(sub_query)}')
    fb_st_2 = time.time()
    updated_records= list(
        (self.update(status='new_status').where(fn.EXISTS(sub_query)).returning(ModelClass))
    )
    print(f'update {len(updated_records)}: {time.time() - fb_st_2}')

    db.close()
    return updated_records

以下是本地测试的输出:

id_1: id_1_1676475997_PQXYEQGJWR
len(sub_query): 2
update 20000: 1.0583274364471436
fetch_batch 20000: 1.1167597770690918
count_things 0: 0.02147078514099121
processed_things: 20000

子查询正确地返回了2,但是更新查询where(fn.EXISTS(sub_query))似乎被忽略了。我对它的工作原理理解有误吗?
编辑1:我认为需要GROUP BY,因为行可以具有相同的granule_id,并且我需要获取最多batch_sizegranule_id s的行

6rvt4ljy

6rvt4ljy1#

我认为您在这里使用UPDATE...WHERE EXISTS是不正确或不恰当的。这可能对您更有效:

# Unsure why you have a GROUP BY with no aggregation, that seems
# incorrect possibly, so I've removed it.
sub_query = (self.select(ModelClass.id)
             .where(
                (ModelClass.status == 'old_status') &
                (ModelClass.collection_id == id_1) &
                (ModelClass.name.contains(path_1)))
             .order_by(ModelClass.discovered_date.asc())
             .limit(batch_size))

update = (self.update(status='new_status')
          .where(self.id.in_(sub_query))
          .returning(ModelClass))
cursor = update.execute()  # It's good to explicitly execute().
updated_records = list(cursor)

无论如何,关键的思想是我将更新与子查询相关联。

相关问题