php mysql / sql修复多对多数据库中的行

a6b3iqyw  于 2023-01-08  发布在  PHP
关注(0)|答案(1)|浏览(115)

我有一个多对多表,其中有多行由于导入错误而重复。
模式:

CREATE TABLE `actor` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` text NOT NULL,
 `pic_id` int(11) DEFAULT NULL,
 `dob` varchar(4) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `pic_id` (`pic_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3195 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

CREATE TABLE `video_actor` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `video_id` int(11) NOT NULL,
 `actor_id` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `video_id` (`video_id`),
 KEY `actor_id` (`actor_id`),
 CONSTRAINT `video_actor_ibfk_2` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=23757 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

Fiddle
我有一个演员表

table actor
actor.id (pk)
actor.name

我有重复的演员名字
它们通过表video_actor链接到video,如下所示:

table video_actor
id (pk)
actor_id (fk)
video_id (fk)

table video
id (pk)
...etc

我尝试了几个小时的SQL查询,用正确的id更新链接表(video_actor),但是失败了。我觉得我得到的最接近的是:

UPDATE video_actor as x
join 
(
   SELECT min_id,max_id
      FROM (
   SELECT name, MIN(id) as min_id,MAX(id) as max_id
      FROM actor
   GROUP BY name HAVING COUNT(name) > 1
   ) AS y
) as q on x.video_id = q.min_id
SET x.actor_id = q.min_id
WHERE x.actor_id = q.max_id;

上面的查询什么也不做,所以我写了一些php来做我在纯sql中不能做的事情:

public function remove_dup_actors() {
    $sql="
        SELECT name,min_id,max_id
        FROM (
            SELECT name, MIN(id) as min_id,MAX(id) as max_id
            FROM actor
            GROUP BY name HAVING COUNT(name) > 1
        ) as dups;
    ";
    $actor_dups=self::$dB->get_rows($sql);
    foreach($actor_dups as $actor) {
        $sql="UPDATE video_actor SET actor_id=${actor['min_id']} WHERE actor_id=${actor['max_id']}";
        self::$dB->update_row($sql);
        echo "Updated ${actor['name']}\n";
        $sql="DELETE FROM actor WHERE id=${actor['max_id']}";
        self::$dB->update_row($sql);
    }
}

这个管用。
请Maven们,有没有人能告诉我,我的纯sql(失败)解决方案有什么问题?
更新:尝试:

UPDATE video_actor as x
join 
(
   SELECT id,min_id,max_id
      FROM (
         SELECT id, name, MIN(id) as min_id,MAX(id) as max_id
         FROM actor
         GROUP BY name HAVING COUNT(name) > 1
      ) AS y
   ) as q on x.video_id = y.id
SET x.actor_id = q.min_id
WHERE x.actor_id = q.max_id;

and get unknown column y.id on ON clause

s8vozzvw

s8vozzvw1#

从你的问题中并不完全清楚“correct id”是什么,但是我假设你想要所有具有相同actor.name的行中最小的id。

UPDATE actor a
    JOIN video_actor v ON v.actor_id = a.id
    JOIN (
      SELECT MIN(id) as min_id, a2.name
      FROM actor a2
      GROUP BY a2.name
    ) a2 ON a2.name = a.name
SET v.actor_id = a2.min_id;

首先我们连接id/actor_id上的两个表,然后我们可以根据属于给定name的第一个id更新video_actor
你需要处理一些单独的查询来消除重复项,下面是一个例子,说明如何在actor表上实现这一点,方法是将行连接到我们在name上使用的相同的子查询,但这些子查询与min_id不匹配:

DELETE a
FROM actor a
    JOIN (
      SELECT MIN(id) as min_id, a2.name
      FROM actor a2
      GROUP BY a2.name
    ) a2 ON a2.name = a.name AND a.id != a2.min_id;

此外,我认为您可能应该使用VARCHAR作为带索引的actor.name数据类型-这将使这一过程快得多。

相关问题