我有一个多对多表,其中有多行由于导入错误而重复。
模式:
CREATE TABLE `actor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text NOT NULL,
`pic_id` int(11) DEFAULT NULL,
`dob` varchar(4) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pic_id` (`pic_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3195 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;
CREATE TABLE `video_actor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`video_id` int(11) NOT NULL,
`actor_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `video_id` (`video_id`),
KEY `actor_id` (`actor_id`),
CONSTRAINT `video_actor_ibfk_2` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=23757 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;
Fiddle
我有一个演员表
table actor
actor.id (pk)
actor.name
我有重复的演员名字
它们通过表video_actor链接到video,如下所示:
table video_actor
id (pk)
actor_id (fk)
video_id (fk)
table video
id (pk)
...etc
我尝试了几个小时的SQL查询,用正确的id更新链接表(video_actor),但是失败了。我觉得我得到的最接近的是:
UPDATE video_actor as x
join
(
SELECT min_id,max_id
FROM (
SELECT name, MIN(id) as min_id,MAX(id) as max_id
FROM actor
GROUP BY name HAVING COUNT(name) > 1
) AS y
) as q on x.video_id = q.min_id
SET x.actor_id = q.min_id
WHERE x.actor_id = q.max_id;
上面的查询什么也不做,所以我写了一些php来做我在纯sql中不能做的事情:
public function remove_dup_actors() {
$sql="
SELECT name,min_id,max_id
FROM (
SELECT name, MIN(id) as min_id,MAX(id) as max_id
FROM actor
GROUP BY name HAVING COUNT(name) > 1
) as dups;
";
$actor_dups=self::$dB->get_rows($sql);
foreach($actor_dups as $actor) {
$sql="UPDATE video_actor SET actor_id=${actor['min_id']} WHERE actor_id=${actor['max_id']}";
self::$dB->update_row($sql);
echo "Updated ${actor['name']}\n";
$sql="DELETE FROM actor WHERE id=${actor['max_id']}";
self::$dB->update_row($sql);
}
}
这个管用。
请Maven们,有没有人能告诉我,我的纯sql(失败)解决方案有什么问题?
更新:尝试:
UPDATE video_actor as x
join
(
SELECT id,min_id,max_id
FROM (
SELECT id, name, MIN(id) as min_id,MAX(id) as max_id
FROM actor
GROUP BY name HAVING COUNT(name) > 1
) AS y
) as q on x.video_id = y.id
SET x.actor_id = q.min_id
WHERE x.actor_id = q.max_id;
and get unknown column y.id on ON clause
1条答案
按热度按时间s8vozzvw1#
从你的问题中并不完全清楚“correct id”是什么,但是我假设你想要所有具有相同
actor.name
的行中最小的id。首先我们连接
id/actor_id
上的两个表,然后我们可以根据属于给定name
的第一个id更新video_actor
。你需要处理一些单独的查询来消除重复项,下面是一个例子,说明如何在
actor
表上实现这一点,方法是将行连接到我们在name
上使用的相同的子查询,但这些子查询与min_id
不匹配:此外,我认为您可能应该使用
VARCHAR
作为带索引的actor.name
数据类型-这将使这一过程快得多。