数据库批量mysql插入提高数据库结构迁移的性能

jtw3ybtb  于 2021-06-21  发布在  Mysql
关注(0)|答案(2)|浏览(287)

我需要重组我的mysql innodb数据库。
目前我有一个 customer 包含3个产品名称的表。
我需要把这些名字提取到一个新的 product table。这个 product 表应该包含当前在 customer 表并链接到 customer 通过新的 customer_product table。虽然产品名称可能不是唯一的,但它们之间没有任何关系,意思是相互关联 customer 需要将3个新条目插入到 product 表和表中的3个新条目 customer_product table。
所以不是这样:

customer
| id | product_name_a | product_name_b | product_name_c |

我需要这个:

customer
| id |

customer_product
| customer_id | product_id | X3

product
| id | name | X3

我编写了以下mysql过程:

BEGIN
  DECLARE nbr_of_customers BIGINT(20);
  DECLARE customer_count BIGINT(20);
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);

  SELECT COUNT(*) FROM customer INTO nbr_of_customers;
  SET customer_count = 0;
  SET product_id = 1;

  WHILE customer_count < nbr_of_customers DO
    SELECT
      customer.id,
      customer.product_name_a,
      customer.product_name_b,
      customer.product_name_c
    INTO
      customer_id,
      product_name_a,
      product_name_b,
      product_name_c
    FROM customer
    LIMIT customer_count,1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_a);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_b);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_c);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    SET customer_count = customer_count + 1;
  END WHILE;
END;

这太慢了。
我在本地运行过,估计我的~1.5万客户需要~1小时才能完成。我的vps服务器比这慢得多,所以可能需要10个小时才能完成。
问题似乎是插入要花很长时间。因此,我希望在过程中存储所有插入,并在循环完成并且我知道要插入什么之后成批执行它们。
有没有一种方法可以批量执行所有~100k个插入以优化性能,还是有更好的方法?
最终编辑:
我标记了正确的解决方案,因为它在大规模加快进程方面做得很好,这是问题的主要焦点。最后,由于解决方案在不转义插入的字符串方面的限制,我使用修改过的生产代码(在java中)执行了迁移。

aurhwmvo

aurhwmvo1#

也许您可以通过三个单独的插入(而不是~100k)来实现这一点,如下所示:

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_a = product.name

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_b = product.name

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_c = product.name

当然,你得设置你的 product 表,并且您希望从 customer 事后的table。
如果在 customer.product_name_X 列(可能还有 product.name 列,虽然它是如此少,idk如果它将是重要的)。 EXPLAIN 我能帮上忙。

bjp0bcyl

bjp0bcyl2#

首先,使用游标处理单个查询的结果,而不是对每行执行单独的查询。
然后连接 VALUES 列为字符串,您可以使用 PREPARE 以及 EXECUTE .
我的代码以100个客户为一批进行插入,因为我希望查询的大小有限制。

BEGIN
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);
  DECLARE done INT DEFAULT FALSE;
  DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  SET product_id = 1;

  OPEN cur;

  SET @product_values = '';
  SET @cp_values = '';

  read_loop: LOOP
    FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
    IF done THEN
      LEAVE read_loop;
    END IF;

    SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
    SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");

    SET product_id = product_id + 3;

    IF product_id % 300 = 1 -- insert every 100 customers
    THEN BEGIN
         SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
         PREPARE stmt1 FROM @insert_product;
         EXECUTE stmt1;

         SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
         PREPARE stmt2 FROM @insert_cp;
         EXECUTE stmt2;

         SET @product_values = '';
         SET @cp_values = '';
     END IF;

  END LOOP;

  IF @product_values != '' -- Process any remaining rows
  THEN BEGIN
       SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
       PREPARE stmt1 FROM @insert_product;
       EXECUTE stmt1;

       SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
       PREPARE stmt2 FROM @insert_cp;
       EXECUTE stmt2;

       SET @product_values = '';
       SET @cp_values = '';
   END IF;
END;

请注意,使用此解决方案时,产品名称在插入之前不会正确转义。因此,如果任何产品名称包含特殊字符(如单引号),则此解决方案将不起作用 ' .

相关问题